Skip to main content

Showing 151–200 of 295 results for author: Loy, C

.
  1. arXiv:2111.06849  [pdf, other

    cs.CV cs.LG

    Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

    Authors: Liming Jiang, Bo Dai, Wayne Wu, Chen Change Loy

    Abstract: Generative adversarial networks (GANs) typically require ample data for training in order to synthesize high-fidelity images. Recent studies have shown that training GANs with limited data remains formidable due to discriminator overfitting, the underlying cause that impedes the generator's convergence. This paper introduces a novel strategy called Adaptive Pseudo Augmentation (APA) to encourage h… ▽ More

    Submitted 12 November, 2021; originally announced November 2021.

    Comments: NeurIPS 2021. Code: https://github.com/EndlessSora/DeceiveD Project page: https://www.mmlab-ntu.com/project/apa/index.html

  2. arXiv:2111.00763  [pdf, other

    cs.CV

    Monocular 3D Reconstruction of Interacting Hands via Collision-Aware Factorized Refinements

    Authors: Yu Rong, Jingbo Wang, Ziwei Liu, Chen Change Loy

    Abstract: 3D interacting hand reconstruction is essential to facilitate human-machine interaction and human behaviors understanding. Previous works in this field either rely on auxiliary inputs such as depth images or they can only handle a single hand if monocular single RGB images are used. Single-hand methods tend to generate collided hand meshes, when applied to closely interacting hands, since they can… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

    Comments: Accepted to 3DV 2021. Code and demo is available at https://penincillin.github.io/ihmr_3dv2021

  3. arXiv:2110.15678  [pdf, other

    cs.CV

    A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis

    Authors: Xingang Pan, Xudong Xu, Chen Change Loy, Christian Theobalt, Bo Dai

    Abstract: The advancement of generative radiance fields has pushed the boundary of 3D-aware image synthesis. Motivated by the observation that a 3D object should look realistic from multiple viewpoints, these methods introduce a multi-view constraint as regularization to learn valid 3D radiance fields from 2D images. Despite the progress, they often fall short of capturing accurate 3D shapes due to the shap… ▽ More

    Submitted 8 December, 2021; v1 submitted 29 October, 2021; originally announced October 2021.

    Comments: Accepted to NeurIPS2021. We proposed ShadeGAN, which could perform shape-accurate 3D-aware image synthesis by modeling shading in generative implicit models

  4. arXiv:2110.13107  [pdf, other

    cs.CV

    The Nuts and Bolts of Adopting Transformer in GANs

    Authors: Rui Xu, Xiangyu Xu, Kai Chen, Bolei Zhou, Chen Change Loy

    Abstract: Transformer becomes prevalent in computer vision, especially for high-level vision tasks. However, adopting Transformer in the generative adversarial network (GAN) framework is still an open yet challenging problem. In this paper, we conduct a comprehensive empirical study to investigate the properties of Transformer in GAN for high-fidelity image synthesis. Our analysis highlights and reaffirms t… ▽ More

    Submitted 13 June, 2023; v1 submitted 25 October, 2021; originally announced October 2021.

    Comments: CVPR2023 Workshop AI4CC. Project Page: https://nbei.github.io/stransgan.html

  5. arXiv:2110.09327  [pdf, other

    cs.LG cs.CV stat.ML

    Self-Supervised Representation Learning: Introduction, Advances and Challenges

    Authors: Linus Ericsson, Henry Gouk, Chen Change Loy, Timothy M. Hospedales

    Abstract: Self-supervised representation learning methods aim to provide powerful deep feature learning without the requirement of large annotated datasets, thus alleviating the annotation bottleneck that is one of the main barriers to practical deployment of deep learning today. These methods have advanced rapidly in recent years, with their efficacy approaching and sometimes surpassing fully supervised pr… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

  6. arXiv:2110.07588  [pdf, other

    cs.CV

    Playing for 3D Human Recovery

    Authors: Zhongang Cai, Mingyuan Zhang, Jiawei Ren, Chen Wei, Daxuan Ren, Zhengyu Lin, Haiyu Zhao, Lei Yang, Chen Change Loy, Ziwei Liu

    Abstract: Image- and video-based 3D human recovery (i.e., pose and shape estimation) have achieved substantial progress. However, due to the prohibitive cost of motion capture, existing datasets are often limited in scale and diversity. In this work, we obtain massive human sequences by playing the video game with automatically annotated 3D ground truths. Specifically, we contribute GTA-Human, a large-scale… ▽ More

    Submitted 8 September, 2024; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: Homepage: https://caizhongang.github.io/projects/GTA-Human/

  7. arXiv:2110.04562  [pdf, other

    cs.CV eess.IV

    Temporally Consistent Video Colorization with Deep Feature Propagation and Self-regularization Learning

    Authors: Yihao Liu, Hengyuan Zhao, Kelvin C. K. Chan, Xintao Wang, Chen Change Loy, Yu Qiao, Chao Dong

    Abstract: Video colorization is a challenging and highly ill-posed problem. Although recent years have witnessed remarkable progress in single image colorization, there is relatively less research effort on video colorization and existing methods always suffer from severe flickering artifacts (temporal inconsistency) or unsatisfying colorization performance. We address this problem from a new perspective, b… ▽ More

    Submitted 9 October, 2021; originally announced October 2021.

    Comments: 13 pages, 10 figures

  8. arXiv:2109.04760  [pdf, other

    eess.IV cs.CV

    ReconfigISP: Reconfigurable Camera Image Processing Pipeline

    Authors: Ke Yu, Zexian Li, Yue Peng, Chen Change Loy, Jinwei Gu

    Abstract: Image Signal Processor (ISP) is a crucial component in digital cameras that transforms sensor signals into images for us to perceive and understand. Existing ISP designs always adopt a fixed architecture, e.g., several sequential modules connected in a rigid order. Such a fixed ISP architecture may be suboptimal for real-world applications, where camera sensors, scenes and tasks are diverse. In th… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

    Comments: ICCV 2021

  9. arXiv:2109.04425  [pdf, other

    cs.CV

    Talk-to-Edit: Fine-Grained Facial Editing via Dialog

    Authors: Yuming Jiang, Ziqi Huang, Xingang Pan, Chen Change Loy, Ziwei Liu

    Abstract: Facial editing is an important task in vision and graphics with numerous applications. However, existing works are incapable to deliver a continuous and fine-grained editing mode (e.g., editing a slightly smiling face to a big laughing one) with natural interactions with users. In this work, we propose Talk-to-Edit, an interactive facial editing framework that performs fine-grained attribute manip… ▽ More

    Submitted 9 September, 2021; originally announced September 2021.

    Comments: To appear in ICCV2021. Project Page: https://www.mmlab-ntu.com/project/talkedit/, Code: https://github.com/yumingj/Talk-to-Edit

  10. arXiv:2109.02563  [pdf, other

    cs.CV cs.AI cs.GR cs.LG cs.MM

    3D Human Texture Estimation from a Single Image with Transformers

    Authors: Xiangyu Xu, Chen Change Loy

    Abstract: We propose a Transformer-based framework for 3D human texture estimation from a single image. The proposed Transformer is able to effectively exploit the global information of the input image, overcoming the limitations of existing methods that are solely based on convolutional neural networks. In addition, we also propose a mask-fusion strategy to combine the advantages of the RGB-based and textu… ▽ More

    Submitted 6 September, 2021; originally announced September 2021.

    Comments: ICCV 2021 Oral, Project: https://www.mmlab-ntu.com/project/texformer, Code: https://github.com/xuxy09/Texformer

    Journal ref: IEEE International Conference on Computer Vision, 2021

  11. Learning to Prompt for Vision-Language Models

    Authors: Kaiyang Zhou, Jingkang Yang, Chen Change Loy, Ziwei Liu

    Abstract: Large pre-trained vision-language models like CLIP have shown great potential in learning representations that are transferable across a wide range of downstream tasks. Different from the traditional representation learning that is based mostly on discretized labels, vision-language pre-training aligns images and texts in a common feature space, which allows zero-shot transfer to a downstream task… ▽ More

    Submitted 6 October, 2022; v1 submitted 2 September, 2021; originally announced September 2021.

    Comments: International Journal of Computer Vision (IJCV), 2022. Update: Adds results on the DOSCO (DOmain Shift in COntext) benchmark

  12. arXiv:2106.14855  [pdf, other

    cs.CV cs.AI

    K-Net: Towards Unified Image Segmentation

    Authors: Wenwei Zhang, Jiangmiao Pang, Kai Chen, Chen Change Loy

    Abstract: Semantic, instance, and panoptic segmentations have been addressed using different and specialized frameworks despite their underlying connections. This paper presents a unified, simple, and effective framework for these essentially similar tasks. The framework, named K-Net, segments both instances and semantic categories consistently by a group of learnable kernels, where each kernel is responsib… ▽ More

    Submitted 1 November, 2021; v1 submitted 28 June, 2021; originally announced June 2021.

    Comments: Camera ready for NeurIPS2021

  13. arXiv:2106.11952  [pdf, other

    cs.CV

    Unsupervised Object-Level Representation Learning from Scene Images

    Authors: Jiahao Xie, Xiaohang Zhan, Ziwei Liu, Yew Soon Ong, Chen Change Loy

    Abstract: Contrastive self-supervised learning has largely narrowed the gap to supervised pre-training on ImageNet. However, its success highly relies on the object-centric priors of ImageNet, i.e., different augmented views of the same image correspond to the same object. Such a heavily curated constraint becomes immediately infeasible when pre-trained on more complex scene images with many objects. To ove… ▽ More

    Submitted 3 December, 2021; v1 submitted 22 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021. Project page: https://www.mmlab-ntu.com/project/orl/ Code: https://github.com/Jiahao000/ORL

  14. arXiv:2106.01863  [pdf, other

    cs.CV cs.LG eess.IV

    Robust Reference-based Super-Resolution via C2-Matching

    Authors: Yuming Jiang, Kelvin C. K. Chan, Xintao Wang, Chen Change Loy, Ziwei Liu

    Abstract: Reference-based Super-Resolution (Ref-SR) has recently emerged as a promising paradigm to enhance a low-resolution (LR) input image by introducing an additional high-resolution (HR) reference image. Existing Ref-SR methods mostly rely on implicit correspondence matching to borrow HR textures from reference images to compensate for the information loss in input images. However, performing local tra… ▽ More

    Submitted 3 June, 2021; originally announced June 2021.

    Comments: To appear in CVPR2021. The source code is available at https://github.com/yumingj/C2-Matching

  15. arXiv:2106.00592  [pdf, other

    cs.CV cs.AI cs.LG

    Semi-Supervised Domain Generalization with Stochastic StyleMatch

    Authors: Kaiyang Zhou, Chen Change Loy, Ziwei Liu

    Abstract: Ideally, visual learning algorithms should be generalizable, for dealing with any unseen domain shift when deployed in a new target environment; and data-efficient, for reducing development costs by using as little labels as possible. To this end, we study semi-supervised domain generalization (SSDG), which aims to learn a domain-generalizable model using multi-source, partially-labeled training d… ▽ More

    Submitted 15 December, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

    Comments: Tech report. Code available at https://github.com/KaiyangZhou/ssdg-benchmark

  16. arXiv:2104.13371  [pdf, other

    cs.CV

    BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment

    Authors: Kelvin C. K. Chan, Shangchen Zhou, Xiangyu Xu, Chen Change Loy

    Abstract: A recurrent structure is a popular framework choice for the task of video super-resolution. The state-of-the-art method BasicVSR adopts bidirectional propagation with feature alignment to effectively exploit information from the entire input video. In this study, we redesign BasicVSR by proposing second-order grid propagation and flow-guided deformable alignment. We show that by empowering the rec… ▽ More

    Submitted 27 April, 2021; originally announced April 2021.

    Comments: 3 champions and 1 runner-up in NTIRE 2021

  17. arXiv:2104.13366  [pdf, other

    cs.CV

    Unsupervised 3D Shape Completion through GAN Inversion

    Authors: Junzhe Zhang, Xinyi Chen, Zhongang Cai, Liang Pan, Haiyu Zhao, Shuai Yi, Chai Kiat Yeo, Bo Dai, Chen Change Loy

    Abstract: Most 3D shape completion approaches rely heavily on partial-complete shape pairs and learn in a fully supervised manner. Despite their impressive performances on in-domain data, when generalizing to partial shapes in other forms or real-world partial scans, they often obtain unsatisfactory results due to domain gaps. In contrast to previous fully supervised approaches, in this paper we present Sha… ▽ More

    Submitted 29 April, 2021; v1 submitted 27 April, 2021; originally announced April 2021.

    Comments: Accepted in CVPR 2021, project webpage: https://junzhezhang.github.io/projects/ShapeInversion/

  18. arXiv:2104.11116  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS eess.IV

    Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation

    Authors: Hang Zhou, Yasheng Sun, Wayne Wu, Chen Change Loy, Xiaogang Wang, Ziwei Liu

    Abstract: While accurate lip synchronization has been achieved for arbitrary-subject audio-driven talking face generation, the problem of how to efficiently drive the head pose remains. Previous methods rely on pre-estimated structural information such as landmarks and 3D parameters, aiming to generate personalized rhythmic movements. However, the inaccuracy of such estimated information under extreme condi… ▽ More

    Submitted 22 April, 2021; originally announced April 2021.

    Comments: Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. Code and models are available at https://github.com/Hangz-nju-cuhk/Talking-Face_PC-AVS

  19. arXiv:2104.10781  [pdf, other

    eess.IV cs.CV

    NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Methods and Results

    Authors: Ren Yang, Radu Timofte, Jing Liu, Yi Xu, Xinjian Zhang, Minyi Zhao, Shuigeng Zhou, Kelvin C. K. Chan, Shangchen Zhou, Xiangyu Xu, Chen Change Loy, Xin Li, Fanglong Liu, He Zheng, Lielin Jiang, Qi Zhang, Dongliang He, Fu Li, Qingqing Dang, Yibin Huang, Matteo Maggioni, Zhongqian Fu, Shuai Xiao, Cheng li, Thomas Tanay , et al. (47 additional authors not shown)

    Abstract: This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with a focus on the proposed methods and results. In this challenge, the new Large-scale Diverse Video (LDV) dataset is employed. The challenge has three tracks. Tracks 1 and 2 aim at enhancing the videos compressed by HEVC at a fixed QP, while Track 3 is designed for enhancing the videos compressed by x265 at… ▽ More

    Submitted 31 August, 2022; v1 submitted 21 April, 2021; originally announced April 2021.

    Comments: Corrected the MOS values in Table 2, and corrected some minor typos

  20. arXiv:2104.10729  [pdf, other

    cs.CV

    Low-Light Image and Video Enhancement Using Deep Learning: A Survey

    Authors: Chongyi Li, Chunle Guo, Linghao Han, Jun Jiang, Ming-Ming Cheng, Jinwei Gu, Chen Change Loy

    Abstract: Low-light image enhancement (LLIE) aims at improving the perception or interpretability of an image captured in an environment with poor illumination. Recent advances in this area are dominated by deep learning-based solutions, where many learning strategies, network structures, loss functions, training data, etc. have been employed. In this paper, we provide a comprehensive survey to cover variou… ▽ More

    Submitted 5 November, 2021; v1 submitted 21 April, 2021; originally announced April 2021.

    Journal ref: TPAMI 2021

  21. arXiv:2104.09556  [pdf, other

    cs.CV eess.IV

    Removing Diffraction Image Artifacts in Under-Display Camera via Dynamic Skip Connection Network

    Authors: Ruicheng Feng, Chongyi Li, Huaijin Chen, Shuai Li, Chen Change Loy, Jinwei Gu

    Abstract: Recent development of Under-Display Camera (UDC) systems provides a true bezel-less and notch-free viewing experience on smartphones (and TV, laptops, tablets), while allowing images to be captured from the selfie camera embedded underneath. In a typical UDC system, the microstructure of the semi-transparent organic light-emitting diode (OLED) pixel array attenuates and diffracts the incident ligh… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

    Comments: CVPR 2021 camera-ready version

  22. arXiv:2104.07452  [pdf, other

    cs.CV

    Audio-Driven Emotional Video Portraits

    Authors: Xinya Ji, Hang Zhou, Kaisiyuan Wang, Wayne Wu, Chen Change Loy, Xun Cao, Feng Xu

    Abstract: Despite previous success in generating audio-driven talking heads, most of the previous studies focus on the correlation between speech content and the mouth shape. Facial emotion, which is one of the most important features on natural human faces, is always neglected in their methods. In this work, we present Emotional Video Portraits (EVP), a system for synthesizing high-quality video portraits… ▽ More

    Submitted 19 May, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

    Comments: Accepted by CVPR2021

  23. arXiv:2104.03061  [pdf, other

    cs.CV

    Everything's Talkin': Pareidolia Face Reenactment

    Authors: Linsen Song, Wayne Wu, Chaoyou Fu, Chen Qian, Chen Change Loy, Ran He

    Abstract: We present a new application direction named Pareidolia Face Reenactment, which is defined as animating a static illusory face to move in tandem with a human face in the video. For the large differences between pareidolia face reenactment and traditional human face reenactment, two main challenges are introduced, i.e., shape variance and texture variance. In this work, we propose a novel Parametri… ▽ More

    Submitted 7 April, 2021; originally announced April 2021.

    Comments: Accepted by CVPR2021

  24. arXiv:2104.02495  [pdf, other

    cs.CV

    Deep Animation Video Interpolation in the Wild

    Authors: Li Siyao, Shiyu Zhao, Weijiang Yu, Wenxiu Sun, Dimitris N. Metaxas, Chen Change Loy, Ziwei Liu

    Abstract: In the animation industry, cartoon videos are usually produced at low frame rate since hand drawing of such frames is costly and time-consuming. Therefore, it is desirable to develop computational models that can automatically interpolate the in-between animation frames. However, existing video interpolation methods fail to produce satisfying results on animation data. Compared to natural videos,… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

    Comments: Accepted by CVPR21

  25. Domain Generalization: A Survey

    Authors: Kaiyang Zhou, Ziwei Liu, Yu Qiao, Tao Xiang, Chen Change Loy

    Abstract: Generalization to out-of-distribution (OOD) data is a capability natural to humans yet challenging for machines to reproduce. This is because most learning algorithms strongly rely on the i.i.d.~assumption on source/target data, which is often violated in practice due to domain shift. Domain generalization (DG) aims to achieve OOD generalization by using only source data for model learning. Over t… ▽ More

    Submitted 12 August, 2022; v1 submitted 3 March, 2021; originally announced March 2021.

    Comments: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

  26. arXiv:2103.01847  [pdf, other

    cs.CV

    Network Pruning via Resource Reallocation

    Authors: Yuenan Hou, Zheng Ma, Chunxiao Liu, Zhe Wang, Chen Change Loy

    Abstract: Channel pruning is broadly recognized as an effective approach to obtain a small compact model through eliminating unimportant channels from a large cumbersome network. Contemporary methods typically perform iterative pruning procedure from the original over-parameterized model, which is both tedious and expensive especially when the pruning is aggressive. In this paper, we propose a simple yet ef… ▽ More

    Submitted 2 March, 2021; originally announced March 2021.

    Comments: 12 pages, 11 figures, 7 tables

  27. arXiv:2103.00860  [pdf, other

    cs.CV

    Learning to Enhance Low-Light Image via Zero-Reference Deep Curve Estimation

    Authors: Chongyi Li, Chunle Guo, Chen Change Loy

    Abstract: This paper presents a novel method, Zero-Reference Deep Curve Estimation (Zero-DCE), which formulates light enhancement as a task of image-specific curve estimation with a deep network. Our method trains a lightweight deep network, DCE-Net, to estimate pixel-wise and high-order curves for dynamic range adjustment of a given image. The curve estimation is specially designed, considering pixel value… ▽ More

    Submitted 1 March, 2021; originally announced March 2021.

    Comments: This work is an extension of our earlier conference version arXiv:2001.06826 (Zero-DCE) that has appeared in CVRP2020. This paper was accepted by TPAMI

  28. arXiv:2102.12867  [pdf, other

    cs.CV

    FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation

    Authors: Yuhang Zang, Chen Huang, Chen Change Loy

    Abstract: Recent methods for long-tailed instance segmentation still struggle on rare object classes with few training data. We propose a simple yet effective method, Feature Augmentation and Sampling Adaptation (FASA), that addresses the data scarcity issue by augmenting the feature space especially for rare classes. Both the Feature Augmentation (FA) and feature sampling components are adaptive to the act… ▽ More

    Submitted 30 September, 2021; v1 submitted 25 February, 2021; originally announced February 2021.

  29. arXiv:2102.09471  [pdf, other

    cs.CV cs.LG

    DeeperForensics Challenge 2020 on Real-World Face Forgery Detection: Methods and Results

    Authors: Liming Jiang, Zhengkui Guo, Wayne Wu, Zhaoyang Liu, Ziwei Liu, Chen Change Loy, Shuo Yang, Yuanjun Xiong, Wei Xia, Baoying Chen, Peiyu Zhuang, Sili Li, Shen Chen, Taiping Yao, Shouhong Ding, Jilin Li, Feiyue Huang, Liujuan Cao, Rongrong Ji, Changlei Lu, Ganchao Tan

    Abstract: This paper reports methods and results in the DeeperForensics Challenge 2020 on real-world face forgery detection. The challenge employs the DeeperForensics-1.0 dataset, one of the most extensive publicly available real-world face forgery detection datasets, with 60,000 videos constituted by a total of 17.6 million frames. The model evaluation is conducted online on a high-quality hidden test set… ▽ More

    Submitted 18 February, 2021; originally announced February 2021.

    Comments: Technical report. Challenge website: https://competitions.codalab.org/competitions/25228

  30. arXiv:2012.14739  [pdf, other

    cs.CV

    Chasing the Tail in Monocular 3D Human Reconstruction with Prototype Memory

    Authors: Yu Rong, Ziwei Liu, Chen Change Loy

    Abstract: Deep neural networks have achieved great progress in single-image 3D human reconstruction. However, existing methods still fall short in predicting rare poses. The reason is that most of the current models perform regression based on a single human prototype, which is similar to common poses while far from the rare poses. In this work, we 1) identify and analyze this learning obstacle and 2) propo… ▽ More

    Submitted 29 December, 2020; originally announced December 2020.

  31. arXiv:2012.12821  [pdf, other

    cs.CV cs.LG eess.IV

    Focal Frequency Loss for Image Reconstruction and Synthesis

    Authors: Liming Jiang, Bo Dai, Wayne Wu, Chen Change Loy

    Abstract: Image reconstruction and synthesis have witnessed remarkable progress thanks to the development of generative models. Nonetheless, gaps could still exist between the real and generated images, especially in the frequency domain. In this study, we show that narrowing gaps in the frequency domain can ameliorate image reconstruction and synthesis quality further. We propose a novel focal frequency lo… ▽ More

    Submitted 23 August, 2021; v1 submitted 23 December, 2020; originally announced December 2020.

    Comments: ICCV 2021. GitHub: https://github.com/EndlessSora/focal-frequency-loss Project page: https://www.mmlab-ntu.com/project/ffl/index.html

  32. arXiv:2012.12741  [pdf, other

    cs.CV cs.AI

    Exploring Data Augmentation for Multi-Modality 3D Object Detection

    Authors: Wenwei Zhang, Zhe Wang, Chen Change Loy

    Abstract: It is counter-intuitive that multi-modality methods based on point cloud and images perform only marginally better or sometimes worse than approaches that solely use point cloud. This paper investigates the reason behind this phenomenon. Due to the fact that multi-modality data augmentation must maintain consistency between point cloud and images, recent methods in this field typically use relativ… ▽ More

    Submitted 21 April, 2021; v1 submitted 23 December, 2020; originally announced December 2020.

    Comments: Technical Report

  33. arXiv:2012.09413  [pdf, other

    cs.CV

    Computation-Efficient Knowledge Distillation via Uncertainty-Aware Mixup

    Authors: Guodong Xu, Ziwei Liu, Chen Change Loy

    Abstract: Knowledge distillation, which involves extracting the "dark knowledge" from a teacher network to guide the learning of a student network, has emerged as an essential technique for model compression and transfer learning. Unlike previous works that focus on the accuracy of student network, here we study a little-explored but important question, i.e., knowledge distillation efficiency. Our goal is t… ▽ More

    Submitted 17 December, 2020; originally announced December 2020.

    Comments: The code is available at: https://github.com/xuguodong03/UNIXKD

  34. arXiv:2012.05217  [pdf, other

    cs.CV

    Positional Encoding as Spatial Inductive Bias in GANs

    Authors: Rui Xu, Xintao Wang, Kai Chen, Bolei Zhou, Chen Change Loy

    Abstract: SinGAN shows impressive capability in learning internal patch distribution despite its limited effective receptive field. We are interested in knowing how such a translation-invariant convolutional generator could capture the global structure with just a spatially i.i.d. input. In this work, taking SinGAN and StyleGAN2 as examples, we show that such capability, to a large extent, is brought by the… ▽ More

    Submitted 9 December, 2020; originally announced December 2020.

    Comments: paper with appendix, project page: https://nbei.github.io/gan-pos-encoding.html

  35. arXiv:2012.04733  [pdf, other

    cs.CV

    CARAFE++: Unified Content-Aware ReAssembly of FEatures

    Authors: Jiaqi Wang, Kai Chen, Rui Xu, Ziwei Liu, Chen Change Loy, Dahua Lin

    Abstract: Feature reassembly, i.e. feature downsampling and upsampling, is a key operation in a number of modern convolutional network architectures, e.g., residual networks and feature pyramids. Its design is critical for dense prediction tasks such as object detection and semantic/instance segmentation. In this work, we propose unified Content-Aware ReAssembly of FEatures (CARAFE++), a universal, lightwei… ▽ More

    Submitted 7 December, 2020; originally announced December 2020.

    Comments: Technical Report. Extended journal version of the conference paper that appeared as arXiv:1905.02188

  36. arXiv:2012.02181  [pdf, other

    cs.CV

    BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond

    Authors: Kelvin C. K. Chan, Xintao Wang, Ke Yu, Chao Dong, Chen Change Loy

    Abstract: Video super-resolution (VSR) approaches tend to have more components than the image counterparts as they need to exploit the additional temporal dimension. Complex designs are not uncommon. In this study, we wish to untangle the knots and reconsider some most essential components for VSR guided by four basic functionalities, i.e., Propagation, Alignment, Aggregation, and Upsampling. By reusing som… ▽ More

    Submitted 7 April, 2021; v1 submitted 3 December, 2020; originally announced December 2020.

    Comments: CVPR 2021 camera-ready

  37. arXiv:2012.00739  [pdf, other

    cs.CV

    GLEAN: Generative Latent Bank for Large-Factor Image Super-Resolution

    Authors: Kelvin C. K. Chan, Xintao Wang, Xiangyu Xu, Jinwei Gu, Chen Change Loy

    Abstract: We show that pre-trained Generative Adversarial Networks (GANs), e.g., StyleGAN, can be used as a latent bank to improve the restoration quality of large-factor image super-resolution (SR). While most existing SR approaches attempt to generate realistic textures through learning with adversarial loss, our method, Generative LatEnt bANk (GLEAN), goes beyond existing practices by directly leveraging… ▽ More

    Submitted 1 December, 2020; originally announced December 2020.

    Comments: Tech report, 19 pages, 19 figures. A high-resolution version of this paper can be found at https://ckkelvinchan.github.io/

  38. arXiv:2011.00844  [pdf, other

    cs.CV

    Do 2D GANs Know 3D Shape? Unsupervised 3D shape reconstruction from 2D Image GANs

    Authors: Xingang Pan, Bo Dai, Ziwei Liu, Chen Change Loy, Ping Luo

    Abstract: Natural images are projections of 3D objects on a 2D image plane. While state-of-the-art 2D generative models like GANs show unprecedented quality in modeling the natural image manifold, it is unclear whether they implicitly capture the underlying 3D object structures. And if so, how could we exploit such knowledge to recover the 3D shapes of objects in the images? To answer these questions, in th… ▽ More

    Submitted 11 March, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

    Comments: Accepted to ICLR2021 as an oral presentation. Unsupervised 3D reconstruction via 2D GANs

  39. arXiv:2010.13412  [pdf, other

    cs.CV

    Flexible Piecewise Curves Estimation for Photo Enhancement

    Authors: Chongyi Li, Chunle Guo, Qiming Ai, Shangchen Zhou, Chen Change Loy

    Abstract: This paper presents a new method, called FlexiCurve, for photo enhancement. Unlike most existing methods that perform image-to-image mapping, which requires expensive pixel-wise reconstruction, FlexiCurve takes an input image and estimates global curves to adjust the image. The adjustment curves are specially designed for performing piecewise mapping, taking nonlinear adjustment and differentiabil… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

    Comments: 16 pages

  40. arXiv:2009.13240  [pdf, other

    cs.CV cs.LG eess.IV

    Texture Memory-Augmented Deep Patch-Based Image Inpainting

    Authors: Rui Xu, Minghao Guo, Jiaqi Wang, Xiaoxiao Li, Bolei Zhou, Chen Change Loy

    Abstract: Patch-based methods and deep networks have been employed to tackle image inpainting problem, with their own strengths and weaknesses. Patch-based methods are capable of restoring a missing region with high-quality texture through searching nearest neighbor patches from the unmasked regions. However, these methods bring problematic contents when recovering large missing regions. Deep networks, on t… ▽ More

    Submitted 4 November, 2021; v1 submitted 28 September, 2020; originally announced September 2020.

    Comments: Published on TIP. Project Page: https://nbei.github.io/tmad.html

  41. arXiv:2009.07265  [pdf, other

    cs.CV

    Understanding Deformable Alignment in Video Super-Resolution

    Authors: Kelvin C. K. Chan, Xintao Wang, Ke Yu, Chao Dong, Chen Change Loy

    Abstract: Deformable convolution, originally proposed for the adaptation to geometric variations of objects, has recently shown compelling performance in aligning multiple frames and is increasingly adopted for video super-resolution. Despite its remarkable performance, its underlying mechanism for alignment remains unclear. In this study, we carefully investigate the relation between deformable alignment a… ▽ More

    Submitted 15 September, 2020; originally announced September 2020.

    Comments: Tech report, 15 pages, 19 figures

  42. arXiv:2008.11702  [pdf, other

    cs.CV cs.LG

    Delving into Inter-Image Invariance for Unsupervised Visual Representations

    Authors: Jiahao Xie, Xiaohang Zhan, Ziwei Liu, Yew Soon Ong, Chen Change Loy

    Abstract: Contrastive learning has recently shown immense potential in unsupervised visual representation learning. Existing studies in this track mainly focus on intra-image invariance learning. The learning typically uses rich intra-image transformations to construct positive pairs and then maximizes agreement using a contrastive loss. The merits of inter-image invariance, conversely, remain much less exp… ▽ More

    Submitted 15 September, 2022; v1 submitted 26 August, 2020; originally announced August 2020.

    Comments: International Journal of Computer Vision (IJCV), 2022

  43. arXiv:2008.10032  [pdf, other

    cs.CV

    Seesaw Loss for Long-Tailed Instance Segmentation

    Authors: Jiaqi Wang, Wenwei Zhang, Yuhang Zang, Yuhang Cao, Jiangmiao Pang, Tao Gong, Kai Chen, Ziwei Liu, Chen Change Loy, Dahua Lin

    Abstract: Instance segmentation has witnessed a remarkable progress on class-balanced benchmarks. However, they fail to perform as accurately in real-world scenarios, where the category distribution of objects naturally comes with a long tail. Instances of head classes dominate a long-tailed dataset and they serve as negative samples of tail categories. The overwhelming gradients of negative samples on tail… ▽ More

    Submitted 17 June, 2021; v1 submitted 23 August, 2020; originally announced August 2020.

    Comments: CVPR 2021 Camera Ready

  44. arXiv:2007.14878  [pdf, other

    cs.CV

    MessyTable: Instance Association in Multiple Camera Views

    Authors: Zhongang Cai, Junzhe Zhang, Daxuan Ren, Cunjun Yu, Haiyu Zhao, Shuai Yi, Chai Kiat Yeo, Chen Change Loy

    Abstract: We present an interesting and challenging dataset that features a large number of scenes with messy tables captured from multiple camera views. Each scene in this dataset is highly complex, containing multiple object instances that could be identical, stacked and occluded by other instances. The key challenge is to associate all instances given the RGB image of all views. The seemingly simple task… ▽ More

    Submitted 29 July, 2020; originally announced July 2020.

    Comments: Accepted in ECCV 2020

  45. arXiv:2007.12072  [pdf, other

    cs.CV cs.LG eess.IV

    TSIT: A Simple and Versatile Framework for Image-to-Image Translation

    Authors: Liming Jiang, Changxu Zhang, Mingyang Huang, Chunxiao Liu, Jianping Shi, Chen Change Loy

    Abstract: We introduce a simple and versatile framework for image-to-image translation. We unearth the importance of normalization layers, and provide a carefully designed two-stream generative model with newly proposed feature transformations in a coarse-to-fine fashion. This allows multi-scale semantic structure information and style representation to be effectively captured and fused by the network, perm… ▽ More

    Submitted 25 July, 2020; v1 submitted 23 July, 2020; originally announced July 2020.

    Comments: ECCV 2020 (Spotlight). Table 2 is updated. GitHub: https://github.com/EndlessSora/TSIT

  46. arXiv:2007.09319  [pdf, other

    cs.CV

    LiteFlowNet3: Resolving Correspondence Ambiguity for More Accurate Optical Flow Estimation

    Authors: Tak-Wai Hui, Chen Change Loy

    Abstract: Deep learning approaches have achieved great success in addressing the problem of optical flow estimation. The keys to success lie in the use of cost volume and coarse-to-fine flow inference. However, the matching problem becomes ill-posed when partially occluded or homogeneous regions exist in images. This causes a cost volume to contain outliers and affects the flow decoding from it. Besides, th… ▽ More

    Submitted 17 July, 2020; originally announced July 2020.

    Comments: Accepted to ECCV 2020. Trained models and code package are available at https://github.com/twhui/LiteFlowNet3

  47. arXiv:2007.07051  [pdf, other

    cs.CV

    RGB-D Salient Object Detection with Cross-Modality Modulation and Selection

    Authors: Chongyi Li, Runmin Cong, Yongri Piao, Qianqian Xu, Chen Change Loy

    Abstract: We present an effective method to progressively integrate and refine the cross-modality complementarities for RGB-D salient object detection (SOD). The proposed network mainly solves two challenging issues: 1) how to effectively integrate the complementary information from RGB image and its corresponding depth map, and 2) how to adaptively select more saliency-related features. First, we propose a… ▽ More

    Submitted 14 July, 2020; originally announced July 2020.

    Comments: ECCV2020

  48. arXiv:2006.16673  [pdf, other

    cs.CV

    Cross-Scale Internal Graph Neural Network for Image Super-Resolution

    Authors: Shangchen Zhou, Jiawei Zhang, Wangmeng Zuo, Chen Change Loy

    Abstract: Non-local self-similarity in natural images has been well studied as an effective prior in image restoration. However, for single image super-resolution (SISR), most existing deep non-local methods (e.g., non-local neural networks) only exploit similar patches within the same scale of the low-resolution (LR) input image. Consequently, the restoration is limited to using the same-scale information… ▽ More

    Submitted 20 October, 2020; v1 submitted 30 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020

  49. arXiv:2006.10645  [pdf, other

    cs.CV cs.LG

    Online Deep Clustering for Unsupervised Representation Learning

    Authors: Xiaohang Zhan, Jiahao Xie, Ziwei Liu, Yew Soon Ong, Chen Change Loy

    Abstract: Joint clustering and feature learning methods have shown remarkable performance in unsupervised representation learning. However, the training schedule alternating between feature clustering and network parameters update leads to unstable learning of visual representations. To overcome this challenge, we propose Online Deep Clustering (ODC) that performs clustering and network update simultaneousl… ▽ More

    Submitted 18 June, 2020; originally announced June 2020.

    Comments: Accepted by CVPR 2020. Code: https://github.com/open-mmlab/OpenSelfSup

  50. arXiv:2006.07114  [pdf, other

    cs.CV

    Knowledge Distillation Meets Self-Supervision

    Authors: Guodong Xu, Ziwei Liu, Xiaoxiao Li, Chen Change Loy

    Abstract: Knowledge distillation, which involves extracting the "dark knowledge" from a teacher network to guide the learning of a student network, has emerged as an important technique for model compression and transfer learning. Unlike previous works that exploit architecture-specific cues such as activation and attention for distillation, here we wish to explore a more general and model-agnostic approach… ▽ More

    Submitted 13 July, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

    Comments: To appear in ECCV 2020. Code is available at: https://github.com/xuguodong03/SSKD