Skip to main content

Showing 1–50 of 295 results for author: Loy, C

.
  1. arXiv:2506.13465  [pdf, ps, other

    cs.CV eess.IV

    SA-LUT: Spatial Adaptive 4D Look-Up Table for Photorealistic Style Transfer

    Authors: Zerui Gong, Zhonghua Wu, Qingyi Tao, Qinyue Li, Chen Change Loy

    Abstract: Photorealistic style transfer (PST) enables real-world color grading by adapting reference image colors while preserving content structure. Existing methods mainly follow either approaches: generation-based methods that prioritize stylistic fidelity at the cost of content integrity and efficiency, or global color transformation methods such as LUT, which preserve structure but lack local adaptabil… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  2. arXiv:2506.05301  [pdf, other

    cs.CV

    SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training

    Authors: Jianyi Wang, Shanchuan Lin, Zhijie Lin, Yuxi Ren, Meng Wei, Zongsheng Yue, Shangchen Zhou, Hao Chen, Yang Zhao, Ceyuan Yang, Xuefeng Xiao, Chen Change Loy, Lu Jiang

    Abstract: Recent advances in diffusion-based video restoration (VR) demonstrate significant improvement in visual quality, yet yield a prohibitive computational cost during inference. While several distillation-based approaches have exhibited the potential of one-step image restoration, extending existing approaches to VR remains challenging and underexplored, particularly when dealing with high-resolution… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: Draft Ver. Project page: https://iceclear.github.io/projects/seedvr2/

  3. arXiv:2506.03119  [pdf, ps, other

    cs.CV

    Controllable Human-centric Keyframe Interpolation with Generative Prior

    Authors: Zujin Guo, Size Wu, Zhongang Cai, Wei Li, Chen Change Loy

    Abstract: Existing interpolation methods use pre-trained video diffusion priors to generate intermediate frames between sparsely sampled keyframes. In the absence of 3D geometric guidance, these methods struggle to produce plausible results for complex, articulated human motions and offer limited control over the synthesized dynamics. In this paper, we introduce PoseFuse3D Keyframe Interpolator (PoseFuse3D-… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: Project Page: https://gseancdat.github.io/projects/PoseFuse3D_KI

  4. arXiv:2505.23661  [pdf, ps, other

    cs.CV

    OpenUni: A Simple Baseline for Unified Multimodal Understanding and Generation

    Authors: Size Wu, Zhonghua Wu, Zerui Gong, Qingyi Tao, Sheng Jin, Qinyue Li, Wei Li, Chen Change Loy

    Abstract: In this report, we present OpenUni, a simple, lightweight, and fully open-source baseline for unifying multimodal understanding and generation. Inspired by prevailing practices in unified model learning, we adopt an efficient training strategy that minimizes the training complexity and overhead by bridging the off-the-shelf multimodal large language models (LLMs) and diffusion models through a set… ▽ More

    Submitted 2 June, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

  5. arXiv:2505.22636  [pdf, other

    cs.CV

    ObjectClear: Complete Object Removal via Object-Effect Attention

    Authors: Jixin Zhao, Shangchen Zhou, Zhouxia Wang, Peiqing Yang, Chen Change Loy

    Abstract: Object removal requires eliminating not only the target object but also its effects, such as shadows and reflections. However, diffusion-based inpainting methods often produce artifacts, hallucinate content, alter background, and struggle to remove object effects accurately. To address this challenge, we introduce a new dataset for OBject-Effect Removal, named OBER, which provides paired images wi… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Project page: https://zjx0101.github.io/projects/ObjectClear/

  6. arXiv:2503.21979  [pdf, other

    cs.CV

    Harmonizing Visual Representations for Unified Multimodal Understanding and Generation

    Authors: Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Zhonghua Wu, Qingyi Tao, Wentao Liu, Wei Li, Chen Change Loy

    Abstract: Unifying visual understanding and generation within a single multimodal framework remains a significant challenge, as the two inherently heterogeneous tasks require representations at different levels of granularity. Current approaches that utilize vector quantization (VQ) or variational autoencoders (VAE) for unified visual representation prioritize intrinsic imagery features over semantics, comp… ▽ More

    Submitted 22 April, 2025; v1 submitted 27 March, 2025; originally announced March 2025.

  7. arXiv:2503.08664  [pdf, other

    cs.CV cs.AI

    MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention

    Authors: Yuhan Wang, Fangzhou Hong, Shuai Yang, Liming Jiang, Wayne Wu, Chen Change Loy

    Abstract: Multiview diffusion models have shown considerable success in image-to-3D generation for general objects. However, when applied to human data, existing methods have yet to deliver promising results, largely due to the challenges of scaling multiview attention to higher resolutions. In this paper, we explore human multiview diffusion models at the megapixel level and introduce a solution called mes… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: CVPR 2025. Code https://github.com/johannwyh/MEAT Project Page https://johann.wang/MEAT/

  8. arXiv:2503.00746  [pdf, other

    cs.CV

    DoF-Gaussian: Controllable Depth-of-Field for 3D Gaussian Splatting

    Authors: Liao Shen, Tianqi Liu, Huiqiang Sun, Jiaqi Li, Zhiguo Cao, Wei Li, Chen Change Loy

    Abstract: Recent advances in 3D Gaussian Splatting (3D-GS) have shown remarkable success in representing 3D scenes and generating high-quality, novel views in real-time. However, 3D-GS and its variants assume that input images are captured based on pinhole imaging and are fully in focus. This assumption limits their applicability, as real-world images often feature shallow depth-of-field (DoF). In this pape… ▽ More

    Submitted 13 March, 2025; v1 submitted 2 March, 2025; originally announced March 2025.

    Comments: CVPR 2025

  9. arXiv:2501.14677  [pdf, other

    cs.CV

    MatAnyone: Stable Video Matting with Consistent Memory Propagation

    Authors: Peiqing Yang, Shangchen Zhou, Jixin Zhao, Qingyi Tao, Chen Change Loy

    Abstract: Auxiliary-free human video matting methods, which rely solely on input frames, often struggle with complex or ambiguous backgrounds. To address this, we propose MatAnyone, a robust framework tailored for target-assigned video matting. Specifically, building on a memory-based paradigm, we introduce a consistent memory propagation module via region-adaptive memory fusion, which adaptively integrates… ▽ More

    Submitted 25 March, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

    Comments: Project page: https://pq-yang.github.io/projects/MatAnyone

  10. arXiv:2501.09782  [pdf, other

    cs.CV cs.GR cs.HC cs.MM cs.RO

    SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation

    Authors: Wanqi Yin, Zhongang Cai, Ruisi Wang, Ailing Zeng, Chen Wei, Qingping Sun, Haiyi Mei, Yanjun Wang, Hui En Pang, Mingyuan Zhang, Lei Zhang, Chen Change Loy, Atsushi Yamashita, Lei Yang, Ziwei Liu

    Abstract: Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications. Despite encouraging progress, current state-of-the-art methods focus on training innovative architectural designs on confined datasets. In this work, we investigate the impact of scaling up EHPS towards a family of generalist foundation models. 1) For data scaling, we perform… ▽ More

    Submitted 16 January, 2025; originally announced January 2025.

    Comments: An extension of SMPLer-X [arXiv:2309.17448]. Homepage: https://caizhongang.com/projects/SMPLer-X/

  11. arXiv:2501.07256  [pdf, other

    cs.CV

    EdgeTAM: On-Device Track Anything Model

    Authors: Chong Zhou, Chenchen Zhu, Yunyang Xiong, Saksham Suri, Fanyi Xiao, Lemeng Wu, Raghuraman Krishnamoorthi, Bo Dai, Chen Change Loy, Vikas Chandra, Bilge Soran

    Abstract: On top of Segment Anything Model (SAM), SAM 2 further extends its capability from image to video inputs through a memory bank mechanism and obtains a remarkable performance compared with previous methods, making it a foundation model for video segmentation task. In this paper, we aim at making SAM 2 much more efficient so that it even runs on mobile devices while maintaining a comparable performan… ▽ More

    Submitted 13 January, 2025; originally announced January 2025.

    Comments: Code will be released at https://github.com/facebookresearch/EdgeTAM

  12. arXiv:2501.01393  [pdf, other

    cs.CV cs.GR

    Learning 3D Garment Animation from Trajectories of A Piece of Cloth

    Authors: Yidi Shao, Chen Change Loy, Bo Dai

    Abstract: Garment animation is ubiquitous in various applications, such as virtual reality, gaming, and film producing. Recently, learning-based approaches obtain compelling performance in animating diverse garments under versatile scenarios. Nevertheless, to mimic the deformations of the observed garments, data-driven methods require large scale of garment data, which are both resource-wise expensive and t… ▽ More

    Submitted 2 January, 2025; originally announced January 2025.

    Comments: Accepted by NeurIPS2024, 16 pages

  13. arXiv:2501.01320  [pdf, other

    cs.CV

    SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration

    Authors: Jianyi Wang, Zhijie Lin, Meng Wei, Yang Zhao, Ceyuan Yang, Fei Xiao, Chen Change Loy, Lu Jiang

    Abstract: Video restoration poses non-trivial challenges in maintaining fidelity while recovering temporally consistent details from unknown degradations in the wild. Despite recent advances in diffusion-based restoration, these methods often face limitations in generation capability and sampling efficiency. In this work, we present SeedVR, a diffusion transformer designed to handle real-world video restora… ▽ More

    Submitted 22 March, 2025; v1 submitted 2 January, 2025; originally announced January 2025.

    Comments: CVPR25 CR ver., add a co-author additionally. Project page: https://iceclear.github.io/projects/seedvr/

  14. arXiv:2412.18565  [pdf, other

    cs.CV

    3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement

    Authors: Yihang Luo, Shangchen Zhou, Yushi Lan, Xingang Pan, Chen Change Loy

    Abstract: Despite advances in neural rendering, due to the scarcity of high-quality 3D datasets and the inherent limitations of multi-view diffusion models, view synthesis and 3D model generation are restricted to low resolutions with suboptimal multi-view consistency. In this study, we present a novel 3D enhancement pipeline, dubbed 3DEnhancer, which employs a multi-view latent diffusion model to enhance c… ▽ More

    Submitted 28 April, 2025; v1 submitted 24 December, 2024; originally announced December 2024.

    Comments: Project page: https://yihangluo.com/projects/3DEnhancer

  15. arXiv:2412.17804  [pdf, other

    cs.CV cs.GR

    GausSim: Foreseeing Reality by Gaussian Simulator for Elastic Objects

    Authors: Yidi Shao, Mu Huang, Chen Change Loy, Bo Dai

    Abstract: We introduce GausSim, a novel neural network-based simulator designed to capture the dynamic behaviors of real-world elastic objects represented through Gaussian kernels. We leverage continuum mechanics and treat each kernel as a Center of Mass System (CMS) that represents continuous piece of matter, accounting for realistic deformations without idealized assumptions. To improve computational effi… ▽ More

    Submitted 10 March, 2025; v1 submitted 23 December, 2024; originally announced December 2024.

    Comments: Project page: https://www.mmlab-ntu.com/project/gausim/index.html

  16. arXiv:2412.09013  [pdf, other

    cs.CV

    Arbitrary-steps Image Super-resolution via Diffusion Inversion

    Authors: Zongsheng Yue, Kang Liao, Chen Change Loy

    Abstract: This study presents a new image super-resolution (SR) technique based on diffusion inversion, aiming at harnessing the rich image priors encapsulated in large pre-trained diffusion models to improve SR performance. We design a Partial noise Prediction strategy to construct an intermediate state of the diffusion model, which serves as the starting sampling point. Central to our approach is a deep n… ▽ More

    Submitted 13 March, 2025; v1 submitted 12 December, 2024; originally announced December 2024.

    Comments: Accepted by CVPR 2025. Project: https://github.com/zsyOAOA/InvSR

    MSC Class: NA ACM Class: I.4.3

  17. arXiv:2412.07721  [pdf, other

    cs.CV

    ObjCtrl-2.5D: Training-free Object Control with Camera Poses

    Authors: Zhouxia Wang, Yushi Lan, Shangchen Zhou, Chen Change Loy

    Abstract: This study aims to achieve more precise and versatile object control in image-to-video (I2V) generation. Current methods typically represent the spatial movement of target objects with 2D trajectories, which often fail to capture user intention and frequently produce unnatural results. To enhance control, we present ObjCtrl-2.5D, a training-free object control approach that uses a 3D trajectory, e… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

    Comments: Project Page: https://wzhouxiff.github.io/projects/ObjCtrl-2.5D/

  18. arXiv:2411.17769  [pdf, other

    cs.CV

    Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis

    Authors: Xinyu Hou, Zongsheng Yue, Xiaoming Li, Chen Change Loy

    Abstract: In this work, we introduce a single parameter $ω$, to effectively control granularity in diffusion-based synthesis. This parameter is incorporated during the denoising steps of the diffusion model's reverse process. Our approach does not require model retraining, architectural modifications, or additional computational overhead during inference, yet enables precise control over the level of detail… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

    Comments: Project page: https://itsmag11.github.io/Omegance/

  19. arXiv:2411.08033  [pdf, other

    cs.CV cs.AI cs.GR

    GaussianAnything: Interactive Point Cloud Flow Matching For 3D Object Generation

    Authors: Yushi Lan, Shangchen Zhou, Zhaoyang Lyu, Fangzhou Hong, Shuai Yang, Bo Dai, Xingang Pan, Chen Change Loy

    Abstract: While 3D content generation has advanced significantly, existing methods still face challenges with input formats, latent space design, and output representations. This paper introduces a novel 3D generation framework that addresses these challenges, offering scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space. Our framework employs a Variational Autoencode… ▽ More

    Submitted 10 April, 2025; v1 submitted 12 November, 2024; originally announced November 2024.

    Comments: ICLR 2025 project page: https://nirvanalan.github.io/projects/GA/

  20. arXiv:2410.19424  [pdf, other

    cs.CV

    Paint Bucket Colorization Using Anime Character Color Design Sheets

    Authors: Yuekun Dai, Qinyue Li, Shangchen Zhou, Yihang Luo, Chongyi Li, Chen Change Loy

    Abstract: Line art colorization plays a crucial role in hand-drawn animation production, where digital artists manually colorize segments using a paint bucket tool, guided by RGB values from character color design sheets. This process, often called paint bucket colorization, involves two main tasks: keyframe colorization, where colors are applied according to the character's color design sheet, and consecut… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

    Comments: Extension of arXiv:2403.18342; Project page at https://github.com/ykdai/BasicPBC

  21. arXiv:2409.14379  [pdf, other

    cs.CV

    GroupDiff: Diffusion-based Group Portrait Editing

    Authors: Yuming Jiang, Nanxuan Zhao, Qing Liu, Krishna Kumar Singh, Shuai Yang, Chen Change Loy, Ziwei Liu

    Abstract: Group portrait editing is highly desirable since users constantly want to add a person, delete a person, or manipulate existing persons. It is also challenging due to the intricate dynamics of human interactions and the diverse gestures. In this work, we present GroupDiff, a pioneering effort to tackle group photo editing with three dedicated contributions: 1) Data Engine: Since there is no labele… ▽ More

    Submitted 22 September, 2024; originally announced September 2024.

    Comments: ECCV 2024

  22. arXiv:2408.05205  [pdf, other

    cs.CV

    Kalman-Inspired Feature Propagation for Video Face Super-Resolution

    Authors: Ruicheng Feng, Chongyi Li, Chen Change Loy

    Abstract: Despite the promising progress of face image super-resolution, video face super-resolution remains relatively under-explored. Existing approaches either adapt general video super-resolution networks to face datasets or apply established face image super-resolution models independently on individual video frames. These paradigms encounter challenges either in reconstructing facial details or mainta… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV 2024. Project page: https://jnjaby.github.io/projects/KEEP/

  23. arXiv:2407.09842  [pdf, other

    cs.CV

    Eliminating Feature Ambiguity for Few-Shot Segmentation

    Authors: Qianxiong Xu, Guosheng Lin, Chen Change Loy, Cheng Long, Ziyue Li, Rui Zhao

    Abstract: Recent advancements in few-shot segmentation (FSS) have exploited pixel-by-pixel matching between query and support features, typically based on cross attention, which selectively activate query foreground (FG) features that correspond to the same-class support FG features. However, due to the large receptive fields in deep layers of the backbone, the extracted query and support FG features are in… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: This paper is accepted by ECCV'24

  24. arXiv:2407.08680  [pdf, other

    cs.CV

    Generalizable Implicit Motion Modeling for Video Frame Interpolation

    Authors: Zujin Guo, Wei Li, Chen Change Loy

    Abstract: Motion modeling is critical in flow-based Video Frame Interpolation (VFI). Existing paradigms either consider linear combinations of bidirectional flows or directly predict bilateral flows for given timestamps without exploring favorable motion priors, thus lacking the capability of effectively modeling spatiotemporal dynamics in real-world videos. To address this limitation, in this study, we int… ▽ More

    Submitted 10 February, 2025; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: Project Page: https://gseancdat.github.io/projects/GIMMVFI

  25. arXiv:2406.19389  [pdf, other

    cs.CV

    OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

    Authors: Tao Zhang, Xiangtai Li, Hao Fei, Haobo Yuan, Shengqiong Wu, Shunping Ji, Chen Change Loy, Shuicheng Yan

    Abstract: Current universal segmentation methods demonstrate strong capabilities in pixel-level image and video understanding. However, they lack reasoning abilities and cannot be controlled via text instructions. In contrast, large vision-language multimodal models exhibit powerful vision-based conversation and reasoning capabilities but lack pixel-level understanding and have difficulty accepting visual p… ▽ More

    Submitted 1 October, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: NeurIPS-2024. Project page: https://lxtgh.github.io/project/omg_llava/

  26. arXiv:2406.19369  [pdf, other

    cs.CV

    Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model

    Authors: Haobo Yuan, Xiangtai Li, Lu Qi, Tao Zhang, Ming-Hsuan Yang, Shuicheng Yan, Chen Change Loy

    Abstract: Transformer-based segmentation methods face the challenge of efficient inference when dealing with high-resolution images. Recently, several linear attention architectures, such as Mamba and RWKV, have attracted much attention as they can process long sequences efficiently. In this work, we focus on designing an efficient segment-anything model by exploring these different architectures. Specifica… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 16 pages; 8 figures

  27. arXiv:2406.18516  [pdf, other

    cs.CV

    Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration

    Authors: Kang Liao, Zongsheng Yue, Zhouxia Wang, Chen Change Loy

    Abstract: Although learning-based image restoration methods have made significant progress, they still struggle with limited generalization to real-world scenarios due to the substantial domain gap caused by training on synthetic data. Existing methods address this issue by improving data synthesis pipelines, estimating degradation kernels, employing deep internal learning, and performing domain adaptation… ▽ More

    Submitted 19 February, 2025; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted by ICLR2025. Project Page: https://kangliao929.github.io/projects/noise-da/

  28. arXiv:2406.12805  [pdf, other

    cs.CV

    AITTI: Learning Adaptive Inclusive Token for Text-to-Image Generation

    Authors: Xinyu Hou, Xiaoming Li, Chen Change Loy

    Abstract: Despite the high-quality results of text-to-image generation, stereotypical biases have been spotted in their generated contents, compromising the fairness of generative models. In this work, we propose to learn adaptive inclusive tokens to shift the attribute distribution of the final generative outputs. Unlike existing de-biasing approaches, our method requires neither explicit attribute specifi… ▽ More

    Submitted 18 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  29. arXiv:2406.07006  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results

    Authors: Xin Jin, Chunle Guo, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Ruoqi Li, Chang Liu, Ziyi Wang, Yao Du, Jingjing Yang, Long Bao, Heng Sun, Xiangyu Kong, Xiaoxia Xing, Jinlong Wu, Yuanyang Xue, Hyunhee Park, Sejun Song, Changho Kim, Jingfan Tan , et al. (17 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Few-shot RAWImage Denoising Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  30. arXiv:2406.05821  [pdf, other

    cs.CV

    F-LMM: Grounding Frozen Large Multimodal Models

    Authors: Size Wu, Sheng Jin, Wenwei Zhang, Lumin Xu, Wentao Liu, Wei Li, Chen Change Loy

    Abstract: Endowing Large Multimodal Models (LMMs) with visual grounding capability can significantly enhance AIs' understanding of the visual world and their interaction with humans. However, existing methods typically fine-tune the parameters of LMMs to learn additional segmentation tokens and overfit grounding and segmentation datasets. Such a design would inevitably cause a catastrophic diminution in the… ▽ More

    Submitted 11 April, 2025; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: Project Page: https://github.com/wusize/F-LMM

  31. arXiv:2405.04867  [pdf, other

    eess.IV cs.CV

    MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

    Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhijing Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Haijin Zeng, Kai Feng , et al. (24 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

  32. arXiv:2405.02859  [pdf, other

    cs.CV

    MVIP-NeRF: Multi-view 3D Inpainting on NeRF Scenes via Diffusion Prior

    Authors: Honghua Chen, Chen Change Loy, Xingang Pan

    Abstract: Despite the emergence of successful NeRF inpainting methods built upon explicit RGB and depth 2D inpainting supervisions, these methods are inherently constrained by the capabilities of their underlying 2D inpainters. This is due to two key reasons: (i) independently inpainting constituent images results in view-inconsistent imagery, and (ii) 2D inpainters struggle to ensure high-quality geometry… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 14 pages, 10 figures, conference

  33. arXiv:2404.19534  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

    Authors: Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu Jin, Guanqun Liu, Chen Change Loy, Lize Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Shuan Chen, Guangqi Shao, Xiaotao Wang, Lei Lei, Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huanjing Yue, Jingyu Yang , et al. (38 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 27 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  34. arXiv:2404.12352  [pdf, other

    cs.CV

    Point-In-Context: Understanding Point Cloud via In-Context Learning

    Authors: Mengyuan Liu, Zhongbin Fang, Xia Li, Joachim M. Buhmann, Xiangtai Li, Chen Change Loy

    Abstract: With the emergence of large-scale models trained on diverse datasets, in-context learning has emerged as a promising paradigm for multitasking, notably in natural language processing and image processing. However, its application in 3D point cloud tasks remains largely unexplored. In this work, we introduce Point-In-Context (PIC), a novel framework for 3D point cloud understanding via in-context l… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: Project page: https://fanglaosi.github.io/Point-In-Context_Pages. arXiv admin note: text overlap with arXiv:2306.08659

  35. arXiv:2404.10716  [pdf, other

    cs.CV

    MOWA: Multiple-in-One Image Warping Model

    Authors: Kang Liao, Zongsheng Yue, Zhonghua Wu, Chen Change Loy

    Abstract: While recent image warping approaches achieved remarkable success on existing benchmarks, they still require training separate models for each specific task and cannot generalize well to different camera models or customized manipulations. To address diverse types of warping in practice, we propose a Multiple-in-One image WArping model (named MOWA) in this work. Specifically, we mitigate the diffi… ▽ More

    Submitted 3 May, 2025; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: Accepted to TPAMI. Project page: https://kangliao929.github.io/projects/mowa/

  36. arXiv:2403.18811  [pdf, other

    cs.CV cs.GR cs.SD eess.AS

    Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment

    Authors: Li Siyao, Tianpei Gu, Zhitao Yang, Zhengyu Lin, Ziwei Liu, Henghui Ding, Lei Yang, Chen Change Loy

    Abstract: We introduce a novel task within the field of 3D dance generation, termed dance accompaniment, which necessitates the generation of responsive movements from a dance partner, the "follower", synchronized with the lead dancer's movements and the underlying musical rhythm. Unlike existing solo or group dance generation tasks, a duet dance scenario entails a heightened degree of interaction between t… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: ICLR 2024

  37. arXiv:2403.18342  [pdf, other

    cs.CV

    Learning Inclusion Matching for Animation Paint Bucket Colorization

    Authors: Yuekun Dai, Shangchen Zhou, Qinyue Li, Chongyi Li, Chen Change Loy

    Abstract: Colorizing line art is a pivotal task in the production of hand-drawn cel animation. This typically involves digital painters using a paint bucket tool to manually color each segment enclosed by lines, based on RGB values predetermined by a color designer. This frame-by-frame process is both arduous and time-intensive. Current automated methods mainly focus on segment matching. This technique migr… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: accepted to CVPR 2024. Project Page: https://ykdai.github.io/projects/InclusionMatching

  38. arXiv:2403.12962  [pdf, other

    cs.CV

    FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation

    Authors: Shuai Yang, Yifan Zhou, Ziwei Liu, Chen Change Loy

    Abstract: The remarkable efficacy of text-to-image diffusion models has motivated extensive exploration of their potential application in video domains. Zero-shot methods seek to extend image diffusion models to videos without necessitating model training. Recent methods mainly focus on incorporating inter-frame correspondence into attention mechanisms. However, the soft constraint imposed on determining wh… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: CVPR 24, Code: https://github.com/williamyang1991/FRESCO, Project: https://www.mmlab-ntu.com/project/fresco/

  39. arXiv:2403.12019  [pdf, other

    cs.CV

    LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

    Authors: Yushi Lan, Fangzhou Hong, Shuai Yang, Shangchen Zhou, Xuyi Meng, Bo Dai, Xingang Pan, Chen Change Loy

    Abstract: The field of neural rendering has witnessed significant progress with advancements in generative models and differentiable rendering techniques. Though 2D diffusion has achieved success, a unified 3D diffusion pipeline remains unsettled. This paper introduces a novel framework called LN3Diff to address this gap and enable fast, high-quality, and generic conditional 3D generation. Our approach harn… ▽ More

    Submitted 10 August, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: ECCV 2024 Camera ready version. project webpage: https://nirvanalan.github.io/projects/ln3diff/ Code: https://github.com/NIRVANALAN/LN3Diff

  40. arXiv:2403.09616  [pdf, other

    cs.CV

    Explore In-Context Segmentation via Latent Diffusion Models

    Authors: Chaoyang Wang, Xiangtai Li, Henghui Ding, Lu Qi, Jiangning Zhang, Yunhai Tong, Chen Change Loy, Shuicheng Yan

    Abstract: In-context segmentation has drawn increasing attention with the advent of vision foundation models. Its goal is to segment objects using given reference images. Most existing approaches adopt metric learning or masked image modeling to build the correlation between visual prompts and input image queries. This work approaches the problem from a fresh perspective - unlocking the capability of the la… ▽ More

    Submitted 9 March, 2025; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: AAAI 2025

  41. arXiv:2403.07319  [pdf, other

    cs.CV

    Efficient Diffusion Model for Image Restoration by Residual Shifting

    Authors: Zongsheng Yue, Jianyi Wang, Chen Change Loy

    Abstract: While diffusion-based image restoration (IR) methods have achieved remarkable success, they are still limited by the low inference speed attributed to the necessity of executing hundreds or even thousands of sampling steps. Existing acceleration sampling techniques, though seeking to expedite the process, inevitably sacrifice performance to some extent, resulting in over-blurry restored outcomes.… ▽ More

    Submitted 22 November, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: Corrected a typo. TPAMI@2024. Code: https://github.com/zsyOAOA/ResShift

    MSC Class: I.4.4

  42. arXiv:2403.01560  [pdf, other

    cs.CV

    Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition

    Authors: Kun-Yu Lin, Henghui Ding, Jiaming Zhou, Yu-Ming Tang, Yi-Xing Peng, Zhilin Zhao, Chen Change Loy, Wei-Shi Zheng

    Abstract: Building upon the impressive success of CLIP (Contrastive Language-Image Pretraining), recent pioneer works have proposed to adapt the powerful CLIP to video data, leading to efficient and effective video learners for open-vocabulary action recognition. Inspired by that humans perform actions in diverse environments, our work delves into an intriguing question: Can CLIP-based video learners effect… ▽ More

    Submitted 24 May, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

  43. arXiv:2402.10855  [pdf, other

    cs.CV

    Control Color: Multimodal Diffusion-based Interactive Image Colorization

    Authors: Zhexin Liang, Zhaochen Li, Shangchen Zhou, Chongyi Li, Chen Change Loy

    Abstract: Despite the existence of numerous colorization methods, several limitations still exist, such as lack of user interaction, inflexibility in local colorization, unnatural color rendering, insufficient color variation, and color overflow. To solve these issues, we introduce Control Color (CtrlColor), a multi-modal colorization method that leverages the pre-trained Stable Diffusion (SD) model, offeri… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: Project Page: https://zhexinliang.github.io/Control_Color/; Demo Video: https://youtu.be/tSCwA-srl8Q

  44. arXiv:2401.10229  [pdf, other

    cs.CV

    OMG-Seg: Is One Model Good Enough For All Segmentation?

    Authors: Xiangtai Li, Haobo Yuan, Wei Li, Henghui Ding, Size Wu, Wenwei Zhang, Yining Li, Kai Chen, Chen Change Loy

    Abstract: In this work, we address various segmentation tasks, each traditionally tackled by distinct or partially unified models. We propose OMG-Seg, One Model that is Good enough to efficiently and effectively handle all the segmentation tasks, including image semantic, instance, and panoptic segmentation, as well as their video counterparts, open vocabulary settings, prompt-driven, interactive segmentati… ▽ More

    Submitted 1 October, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: CVPR-2024. Project Page: https://lxtgh.github.io/project/omg_seg/

  45. arXiv:2401.10226  [pdf, other

    cs.CV

    Towards Language-Driven Video Inpainting via Multimodal Large Language Models

    Authors: Jianzong Wu, Xiangtai Li, Chenyang Si, Shangchen Zhou, Jingkang Yang, Jiangning Zhang, Yining Li, Kai Chen, Yunhai Tong, Ziwei Liu, Chen Change Loy

    Abstract: We introduce a new task -- language-driven video inpainting, which uses natural language instructions to guide the inpainting process. This approach overcomes the limitations of traditional video inpainting methods that depend on manually labeled binary masks, a process often tedious and labor-intensive. We present the Remove Objects from Videos by Instructions (ROVI) dataset, containing 5,650 vid… ▽ More

    Submitted 1 October, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: CVPR-2024. Project Page: https://jianzongwu.github.io/projects/rovi

  46. Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively

    Authors: Haobo Yuan, Xiangtai Li, Chong Zhou, Yining Li, Kai Chen, Chen Change Loy

    Abstract: The CLIP and Segment Anything Model (SAM) are remarkable vision foundation models (VFMs). SAM excels in segmentation tasks across diverse domains, whereas CLIP is renowned for its zero-shot recognition capabilities. This paper presents an in-depth exploration of integrating these two models into a unified framework. Specifically, we introduce the Open-Vocabulary SAM, a SAM-inspired model designed… ▽ More

    Submitted 13 September, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

    Comments: Accepted by ECCV 2024; Project page: https://www.mmlab-ntu.com/project/ovsam; Code: https://github.com/HarborYuan/ovsam

  47. arXiv:2312.11376  [pdf, other

    cs.CV

    CLIM: Contrastive Language-Image Mosaic for Region Representation

    Authors: Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Wentao Liu, Chen Change Loy

    Abstract: Detecting objects accurately from a large or open vocabulary necessitates the vision-language alignment on region representations. However, learning such a region-text alignment by obtaining high-quality box annotations with text labels or descriptions is expensive and infeasible. In contrast, collecting image-text pairs is simpler but lacks precise object location information to associate regions… ▽ More

    Submitted 19 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

  48. arXiv:2312.06660  [pdf, other

    cs.CV

    EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM

    Authors: Chong Zhou, Xiangtai Li, Chen Change Loy, Bo Dai

    Abstract: This paper presents EdgeSAM, an accelerated variant of the Segment Anything Model (SAM), optimized for efficient execution on edge devices with minimal compromise in performance. Our approach involves distilling the original ViT-based SAM image encoder into a purely CNN-based architecture, better suited for edge devices. We carefully benchmark various distillation strategies and demonstrate that t… ▽ More

    Submitted 19 July, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: Project page: https://mmlab-ntu.github.io/project/edgesam/

  49. arXiv:2312.06640  [pdf, other

    cs.CV

    Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution

    Authors: Shangchen Zhou, Peiqing Yang, Jianyi Wang, Yihang Luo, Chen Change Loy

    Abstract: Text-based diffusion models have exhibited remarkable success in generation and editing, showing great promise for enhancing visual content with their generative prior. However, applying these models to video super-resolution remains challenging due to the high demands for output fidelity and temporal consistency, which is complicated by the inherent randomness in diffusion models. Our study intro… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: Equal contributions from first two authors. Project page: https://shangchenzhou.com/projects/upscale-a-video/

  50. arXiv:2312.04547  [pdf, other

    cs.CV cs.AI cs.GR cs.HC

    Digital Life Project: Autonomous 3D Characters with Social Intelligence

    Authors: Zhongang Cai, Jianping Jiang, Zhongfei Qing, Xinying Guo, Mingyuan Zhang, Zhengyu Lin, Haiyi Mei, Chen Wei, Ruisi Wang, Wanqi Yin, Xiangyu Fan, Han Du, Liang Pan, Peng Gao, Zhitao Yang, Yang Gao, Jiaqi Li, Tianxiang Ren, Yukun Wei, Xiaogang Wang, Chen Change Loy, Lei Yang, Ziwei Liu

    Abstract: In this work, we present Digital Life Project, a framework utilizing language as the universal medium to build autonomous 3D characters, who are capable of engaging in social interactions and expressing with articulated body motions, thereby simulating life in a digital environment. Our framework comprises two primary components: 1) SocioMind: a meticulously crafted digital brain that models perso… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: Homepage: https://digital-life-project.com/