Skip to main content

Showing 1–50 of 81 results for author: Lau, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.22432  [pdf, ps, other

    cs.CV

    Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy

    Authors: Yuhao Liu, Tengfei Wang, Fang Liu, Zhenwei Wang, Rynson W. H. Lau

    Abstract: Recent advances in deep generative modeling have unlocked unprecedented opportunities for video synthesis. In real-world applications, however, users often seek tools to faithfully realize their creative editing intentions with precise and consistent control. Despite the progress achieved by existing methods, ensuring fine-grained alignment with user intentions remains an open and challenging prob… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

  2. arXiv:2506.12617  [pdf, ps, other

    cs.AI cs.HC

    From Human to Machine Psychology: A Conceptual Framework for Understanding Well-Being in Large Language Models

    Authors: G. R. Lau, W. Y. Low

    Abstract: As large language models (LLMs) increasingly simulate human cognition and behavior, researchers have begun to investigate their psychological properties. Yet, what it means for such models to flourish, a core construct in human well-being, remains unexplored. This paper introduces the concept of machine flourishing and proposes the PAPERS framework, a six-dimensional model derived from thematic an… ▽ More

    Submitted 26 June, 2025; v1 submitted 14 June, 2025; originally announced June 2025.

  3. arXiv:2506.04225  [pdf, ps, other

    cs.CV

    Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation

    Authors: Tianyu Huang, Wangguandong Zheng, Tengfei Wang, Yuhao Liu, Zhenwei Wang, Junta Wu, Jie Jiang, Hui Li, Rynson W. H. Lau, Wangmeng Zuo, Chunchao Guo

    Abstract: Real-world applications like video gaming and virtual reality often demand the ability to model 3D scenes that users can explore along custom camera trajectories. While significant progress has been made in generating 3D objects from text or images, creating long-range, 3D-consistent, explorable 3D scenes remains a complex and challenging problem. In this work, we present Voyager, a novel video di… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  4. arXiv:2505.05064  [pdf, ps, other

    cs.LG

    WaterDrum: Watermarking for Data-centric Unlearning Metric

    Authors: Xinyang Lu, Xinyuan Niu, Gregory Kang Ruey Lau, Bui Thi Cam Nhung, Rachael Hwee Ling Sim, Fanyu Wen, Chuan-Sheng Foo, See-Kiong Ng, Bryan Kian Hsiang Low

    Abstract: Large language model (LLM) unlearning is critical in real-world applications where it is necessary to efficiently remove the influence of private, copyrighted, or harmful data from some users. However, existing utility-centric unlearning metrics (based on model utility) may fail to accurately evaluate the extent of unlearning in realistic settings such as when (a) the forget and retain set have se… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  5. arXiv:2503.07593  [pdf, other

    cs.CV

    Hierarchical Cross-Modal Alignment for Open-Vocabulary 3D Object Detection

    Authors: Youjun Zhao, Jiaying Lin, Rynson W. H. Lau

    Abstract: Open-vocabulary 3D object detection (OV-3DOD) aims at localizing and classifying novel objects beyond closed sets. The recent success of vision-language models (VLMs) has demonstrated their remarkable capabilities to understand open vocabularies. Existing works that leverage VLMs for 3D object detection (3DOD) generally resort to representations that lose the rich scene context required for 3D per… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: AAAI 2025 (Extented Version). Project Page: https://youjunzhao.github.io/HCMA/

  6. arXiv:2503.07070  [pdf, other

    cs.LG cs.AI physics.comp-ph physics.data-an stat.ML

    PIED: Physics-Informed Experimental Design for Inverse Problems

    Authors: Apivich Hemachandra, Gregory Kang Ruey Lau, See-Kiong Ng, Bryan Kian Hsiang Low

    Abstract: In many science and engineering settings, system dynamics are characterized by governing PDEs, and a major challenge is to solve inverse problems (IPs) where unknown PDE parameters are inferred based on observational data gathered under limited budget. Due to the high costs of setting up and running experiments, experimental design (ED) is often done with the help of PDE simulations to optimize fo… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: Accepted to 13th International Conference on Learning Representations (ICLR 2025), 31 pages

  7. arXiv:2502.00270  [pdf, other

    cs.LG cs.AI stat.ML

    DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks

    Authors: Zhiliang Chen, Gregory Kang Ruey Lau, Chuan-Sheng Foo, Bryan Kian Hsiang Low

    Abstract: The performance of an LLM depends heavily on the relevance of its training data to the downstream evaluation task. However, in practice, the data involved in an unseen evaluation task is often unknown (e.g., conversations between an LLM and a user are end-to-end encrypted). Hence, it is unclear what data are relevant for fine-tuning the LLM to maximize its performance on the specific unseen evalua… ▽ More

    Submitted 18 May, 2025; v1 submitted 31 January, 2025; originally announced February 2025.

  8. arXiv:2412.15238  [pdf, other

    cs.CL cs.AI cs.LG cs.MA

    Dipper: Diversity in Prompts for Producing Large Language Model Ensembles in Reasoning tasks

    Authors: Gregory Kang Ruey Lau, Wenyang Hu, Diwen Liu, Jizhuo Chen, See-Kiong Ng, Bryan Kian Hsiang Low

    Abstract: Large Language Models still encounter substantial challenges in reasoning tasks, especially for smaller models, which many users may be restricted to due to resource constraints (e.g. GPU memory restrictions). Inference-time methods to boost LLM performance, such as prompting methods to invoke certain reasoning pathways in responses, have been shown effective in past works, though they largely rel… ▽ More

    Submitted 12 December, 2024; originally announced December 2024.

    Comments: Accepted to NeurIPS 2024 Workshop on Foundation Model Interventions (MINT)

  9. arXiv:2412.09603  [pdf, other

    cs.CV

    Do Multimodal Large Language Models See Like Humans?

    Authors: Jiaying Lin, Shuquan Ye, Rynson W. H. Lau

    Abstract: Multimodal Large Language Models (MLLMs) have achieved impressive results on various vision tasks, leveraging recent advancements in large language models. However, a critical question remains unaddressed: do MLLMs perceive visual information similarly to humans? Current benchmarks lack the ability to evaluate MLLMs from this perspective. To address this challenge, we introduce HVSBench, a large-s… ▽ More

    Submitted 27 March, 2025; v1 submitted 12 December, 2024; originally announced December 2024.

    Comments: Project page: https://jiaying.link/HVSBench/

  10. arXiv:2411.14429  [pdf, other

    cs.CV cs.AI

    Revisiting the Integration of Convolution and Attention for Vision Backbone

    Authors: Lei Zhu, Xinjiang Wang, Wayne Zhang, Rynson W. H. Lau

    Abstract: Convolutions (Convs) and multi-head self-attentions (MHSAs) are typically considered alternatives to each other for building vision backbones. Although some works try to integrate both, they apply the two operators simultaneously at the finest pixel granularity. With Convs responsible for per-pixel feature extraction already, the question is whether we still need to include the heavy MHSAs at such… ▽ More

    Submitted 21 November, 2024; originally announced November 2024.

    Comments: NeurIPS 2024

  11. arXiv:2411.06757  [pdf, other

    cs.CV

    LuSh-NeRF: Lighting up and Sharpening NeRFs for Low-light Scenes

    Authors: Zefan Qu, Ke Xu, Gerhard Petrus Hancke, Rynson W. H. Lau

    Abstract: Neural Radiance Fields (NeRFs) have shown remarkable performances in producing novel-view images from high-quality scene images. However, hand-held low-light photography challenges NeRFs as the captured images may simultaneously suffer from low visibility, noise, and camera shakes. While existing NeRF methods may handle either low light or motion, directly combining them or incorporating additiona… ▽ More

    Submitted 11 November, 2024; originally announced November 2024.

    Comments: Accepted by NeurIPS 2024

  12. arXiv:2410.01544  [pdf, other

    cs.CV

    Boosting Weakly-Supervised Referring Image Segmentation via Progressive Comprehension

    Authors: Zaiquan Yang, Yuhao Liu, Jiaying Lin, Gerhard Hancke, Rynson W. H. Lau

    Abstract: This paper explores the weakly-supervised referring image segmentation (WRIS) problem, and focuses on a challenging setup where target localization is learned directly from image-text pairs. We note that the input text description typically already contains detailed information on how to localize the target object, and we also observe that humans often follow a step-by-step comprehension process (… ▽ More

    Submitted 4 December, 2024; v1 submitted 2 October, 2024; originally announced October 2024.

    Comments: Accepted to NeurIPS2024

  13. arXiv:2409.11406  [pdf, other

    cs.CV

    Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

    Authors: Zhenwei Wang, Tengfei Wang, Zexin He, Gerhard Hancke, Ziwei Liu, Rynson W. H. Lau

    Abstract: In 3D modeling, designers often use an existing 3D model as a reference to create new ones. This practice has inspired the development of Phidias, a novel generative model that uses diffusion for reference-augmented 3D generation. Given an image, our method leverages a retrieved or user-provided 3D reference model to guide the generation process, thereby enhancing the generation quality, generaliz… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

    Comments: Project page: https://RAG-3D.github.io/

  14. arXiv:2408.11030  [pdf, other

    cs.CV

    OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding

    Authors: Youjun Zhao, Jiaying Lin, Shuquan Ye, Qianshi Pang, Rynson W. H. Lau

    Abstract: Open-vocabulary 3D scene understanding (OV-3D) aims to localize and classify novel objects beyond the closed set of object classes. However, existing approaches and benchmarks primarily focus on the open vocabulary problem within the context of object classes, which is insufficient in providing a holistic evaluation to what extent a model understands the 3D scene. In this paper, we introduce a mor… ▽ More

    Submitted 9 March, 2025; v1 submitted 20 August, 2024; originally announced August 2024.

  15. arXiv:2407.04411  [pdf, other

    cs.CR cs.AI cs.CL

    Waterfall: Framework for Robust and Scalable Text Watermarking and Provenance for LLMs

    Authors: Gregory Kang Ruey Lau, Xinyuan Niu, Hieu Dao, Jiangwei Chen, Chuan-Sheng Foo, Bryan Kian Hsiang Low

    Abstract: Protecting intellectual property (IP) of text such as articles and code is increasingly important, especially as sophisticated attacks become possible, such as paraphrasing by large language models (LLMs) or even unauthorized training of LLMs on copyrighted text to infringe such IP. However, existing text watermarking methods are not robust enough against such attacks nor scalable to millions of u… ▽ More

    Submitted 29 October, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted to EMNLP 2024 Main Conference

  16. arXiv:2406.14473  [pdf, other

    cs.LG cs.CL

    Data-Centric AI in the Age of Large Language Models

    Authors: Xinyi Xu, Zhaoxuan Wu, Rui Qiao, Arun Verma, Yao Shu, Jingtan Wang, Xinyuan Niu, Zhenfeng He, Jiangwei Chen, Zijian Zhou, Gregory Kang Ruey Lau, Hieu Dao, Lucas Agussurja, Rachael Hwee Ling Sim, Xiaoqiang Lin, Wenyang Hu, Zhongxiang Dai, Pang Wei Koh, Bryan Kian Hsiang Low

    Abstract: This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs). We start by making the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs, and yet it receives disproportionally low attention from the research community. We identify four specific… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Preprint

  17. arXiv:2406.10652  [pdf, ps, other

    cs.CV

    MDeRainNet: An Efficient Macro-pixel Image Rain Removal Network

    Authors: Tao Yan, Weijiang He, Chenglong Wang, Cihang Wei, Xiangjie Zhu, Yinghui Wang, Rynson W. H. Lau

    Abstract: Since rainy weather always degrades image quality and poses significant challenges to most computer vision-based intelligent systems, image de-raining has been a hot research topic. Fortunately, in a rainy light field (LF) image, background obscured by rain streaks in one sub-view may be visible in the other sub-views, and implicit depth information and recorded 4D structural information may benef… ▽ More

    Submitted 23 June, 2025; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: 14 pages, 14 figures, 4 tables

  18. arXiv:2406.01476  [pdf, other

    cs.CV

    DreamPhysics: Learning Physics-Based 3D Dynamics with Video Diffusion Priors

    Authors: Tianyu Huang, Haoze Zhang, Yihan Zeng, Zhilu Zhang, Hui Li, Wangmeng Zuo, Rynson W. H. Lau

    Abstract: Dynamic 3D interaction has been attracting a lot of attention recently. However, creating such 4D content remains challenging. One solution is to animate 3D scenes with physics-based simulation, which requires manually assigning precise physical properties to the object or the simulated results would become unnatural. Another solution is to learn the deformation of 3D objects with the distillation… ▽ More

    Submitted 18 December, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted by AAAI 2025. Codes are released at: https://github.com/tyhuang0428/DreamPhysics

  19. arXiv:2405.17725  [pdf, other

    cs.CV

    Color Shift Estimation-and-Correction for Image Enhancement

    Authors: Yiyu Li, Ke Xu, Gerhard Petrus Hancke, Rynson W. H. Lau

    Abstract: Images captured under sub-optimal illumination conditions may contain both over- and under-exposures. Current approaches mainly focus on adjusting image brightness, which may exacerbate the color tone distortion in under-exposed areas and fail to restore accurate colors in over-exposed regions. We observe that over- and under-exposed regions display opposite color tone distribution shifts with res… ▽ More

    Submitted 29 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: CVPR2024 accepted paper

  20. arXiv:2404.07662  [pdf, other

    cs.LG cs.AI physics.comp-ph physics.data-an stat.ML

    PINNACLE: PINN Adaptive ColLocation and Experimental points selection

    Authors: Gregory Kang Ruey Lau, Apivich Hemachandra, See-Kiong Ng, Bryan Kian Hsiang Low

    Abstract: Physics-Informed Neural Networks (PINNs), which incorporate PDEs as soft constraints, train with a composite loss function that contains multiple training point types: different types of collocation points chosen during training to enforce each PDE and initial/boundary conditions, and experimental points which are usually costly to obtain via experiments or simulations. Training PINNs using this l… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: Accepted to 12th International Conference on Learning Representations (ICLR 2024), 36 pages

  21. arXiv:2403.17013  [pdf, other

    cs.CV cs.LG

    Temporal-Spatial Processing of Event Camera Data via Delay-Loop Reservoir Neural Network

    Authors: Richard Lau, Anthony Tylan-Tyler, Lihan Yao, Rey de Castro Roberto, Robert Taylor, Isaiah Jones

    Abstract: This paper describes a temporal-spatial model for video processing with special applications to processing event camera videos. We propose to study a conjecture motivated by our previous study of video processing with delay loop reservoir (DLR) neural network, which we call Temporal-Spatial Conjecture (TSC). The TSC postulates that there is significant information content carried in the temporal r… ▽ More

    Submitted 12 February, 2024; originally announced March 2024.

    Comments: 10 pages, 12 figures, Darpa Distribution Statement A. Approved for public release. Distribution Unlimited

  22. arXiv:2403.16224  [pdf, other

    cs.CV

    Inverse Rendering of Glossy Objects via the Neural Plenoptic Function and Radiance Fields

    Authors: Haoyuan Wang, Wenbo Hu, Lei Zhu, Rynson W. H. Lau

    Abstract: Inverse rendering aims at recovering both geometry and materials of objects. It provides a more compatible reconstruction for conventional rendering engines, compared with the neural radiance fields (NeRFs). On the other hand, existing NeRF-based inverse rendering methods cannot handle glossy objects with local light interactions well, as they typically oversimplify the illumination as a 2D enviro… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: CVPR 2024 paper. Project webpage https://whyy.site/paper/nep

  23. arXiv:2403.15383  [pdf, other

    cs.CV

    ThemeStation: Generating Theme-Aware 3D Assets from Few Exemplars

    Authors: Zhenwei Wang, Tengfei Wang, Gerhard Hancke, Ziwei Liu, Rynson W. H. Lau

    Abstract: Real-world applications often require a large gallery of 3D assets that share a consistent theme. While remarkable advances have been made in general 3D content creation from text or image, synthesizing customized 3D assets following the shared theme of input 3D exemplars remains an open and challenging problem. In this work, we present ThemeStation, a novel approach for theme-aware 3D-to-3D gener… ▽ More

    Submitted 15 May, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

    Comments: Accepted to SIGGRAPH 2024. Project page: https://3dthemestation.github.io/

  24. arXiv:2403.00644  [pdf, other

    cs.CV

    Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks

    Authors: Yuhao Liu, Zhanghan Ke, Fang Liu, Nanxuan Zhao, Rynson W. H. Lau

    Abstract: Diffusion models trained on large-scale datasets have achieved remarkable progress in image synthesis. However, due to the randomness in the diffusion process, they often struggle with handling diverse low-level tasks that require details preservation. To overcome this limitation, we present a new Diff-Plugin framework to enable a single pre-trained diffusion model to generate high-fidelity result… ▽ More

    Submitted 28 May, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR2024. Replaced some celebrity images to avoid copyright disputes

  25. arXiv:2402.14808  [pdf, other

    cs.CL

    RelayAttention for Efficient Large Language Model Serving with Long System Prompts

    Authors: Lei Zhu, Xinjiang Wang, Wayne Zhang, Rynson W. H. Lau

    Abstract: A practical large language model (LLM) service may involve a long system prompt, which specifies the instructions, examples, and knowledge documents of the task and is reused across requests. However, the long system prompt causes throughput/latency bottlenecks as the cost of generating the next token grows w.r.t. the sequence length. This paper aims to improve the efficiency of LLM services that… ▽ More

    Submitted 30 May, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: accepted by the ACL 2024 main conference

  26. arXiv:2402.13631  [pdf, other

    cs.CV

    Delving into Dark Regions for Robust Shadow Detection

    Authors: Huankang Guan, Ke Xu, Rynson W. H. Lau

    Abstract: Shadow detection is a challenging task as it requires a comprehensive understanding of shadow characteristics and global/local illumination conditions. We observe from our experiment that state-of-the-art deep methods tend to have higher error rates in differentiating shadow pixels from non-shadow pixels in dark regions (ie, regions with low-intensity values). Our key insight to this problem is th… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  27. arXiv:2402.00341  [pdf, other

    cs.CV

    Recasting Regional Lighting for Shadow Removal

    Authors: Yuhao Liu, Zhanghan Ke, Ke Xu, Fang Liu, Zhenwei Wang, Rynson W. H. Lau

    Abstract: Removing shadows requires an understanding of both lighting conditions and object textures in a scene. Existing methods typically learn pixel-level color mappings between shadow and non-shadow images, in which the joint modeling of lighting and object textures is implicit and inadequate. We observe that in a shadow region, the degradation degree of object textures depends on the local illumination… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: AAAI 2024 (Oral)

  28. arXiv:2312.06439  [pdf, other

    cs.CV

    DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior

    Authors: Tianyu Huang, Yihan Zeng, Zhilu Zhang, Wan Xu, Hang Xu, Songcen Xu, Rynson W. H. Lau, Wangmeng Zuo

    Abstract: 3D generation has raised great attention in recent years. With the success of text-to-image diffusion models, the 2D-lifting technique becomes a promising route to controllable 3D generation. However, these methods tend to present inconsistent geometry, which is also known as the Janus problem. We observe that the problem is caused mainly by two aspects, i.e., viewpoint bias in 2D diffusion models… ▽ More

    Submitted 12 March, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: Accepted by CVPR 2024

  29. arXiv:2310.05373  [pdf, other

    cs.LG cs.AI

    Quantum Bayesian Optimization

    Authors: Zhongxiang Dai, Gregory Kang Ruey Lau, Arun Verma, Yao Shu, Bryan Kian Hsiang Low, Patrick Jaillet

    Abstract: Kernelized bandits, also known as Bayesian optimization (BO), has been a prevalent method for optimizing complicated black-box reward functions. Various BO algorithms have been theoretically shown to enjoy upper bounds on their cumulative regret which are sub-linear in the number T of iterations, and a regret lower bound of Omega(sqrt(T)) has been derived which represents the unavoidable regrets f… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: Accepted to NeurIPS 2023

  30. arXiv:2309.17175  [pdf, other

    cs.CV

    TextField3D: Towards Enhancing Open-Vocabulary 3D Generation with Noisy Text Fields

    Authors: Tianyu Huang, Yihan Zeng, Bowen Dong, Hang Xu, Songcen Xu, Rynson W. H. Lau, Wangmeng Zuo

    Abstract: Recent works learn 3D representation explicitly under text-3D guidance. However, limited text-3D data restricts the vocabulary scale and text control of generations. Generators may easily fall into a stereotype concept for certain text prompts, thus losing open-vocabulary generation ability. To tackle this issue, we introduce a conditional 3D generative model, namely TextField3D. Specifically, rat… ▽ More

    Submitted 14 March, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: Accepted by ICLR 2024

  31. arXiv:2309.09774  [pdf, other

    cs.LG cs.CV

    Towards Self-Adaptive Pseudo-Label Filtering for Semi-Supervised Learning

    Authors: Lei Zhu, Zhanghan Ke, Rynson Lau

    Abstract: Recent semi-supervised learning (SSL) methods typically include a filtering strategy to improve the quality of pseudo labels. However, these filtering strategies are usually hand-crafted and do not change as the model is updated, resulting in a lot of correct pseudo labels being discarded and incorrect pseudo labels being selected during the training process. In this work, we observe that the dist… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

    Comments: This paper was first submitted to NeurIPS 2021

  32. arXiv:2308.14575  [pdf, other

    cs.CV

    Referring Image Segmentation Using Text Supervision

    Authors: Fang Liu, Yuhao Liu, Yuqiu Kong, Ke Xu, Lihe Zhang, Baocai Yin, Gerhard Hancke, Rynson Lau

    Abstract: Existing Referring Image Segmentation (RIS) methods typically require expensive pixel-level or box-level annotations for supervision. In this paper, we observe that the referring texts used in RIS already provide sufficient information to localize the target object. Hence, we propose a novel weakly-supervised RIS framework to formulate the target localization problem as a classification process to… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  33. arXiv:2308.03059  [pdf, other

    cs.CV cs.AI cs.GR

    Language-based Photo Color Adjustment for Graphic Designs

    Authors: Zhenwei Wang, Nanxuan Zhao, Gerhard Hancke, Rynson W. H. Lau

    Abstract: Adjusting the photo color to associate with some design elements is an essential way for a graphic design to effectively deliver its message and make it aesthetically pleasing. However, existing tools and previous works face a dilemma between the ease of use and level of expressiveness. To this end, we introduce an interactive language-based approach for photo recoloring, which provides an intuiti… ▽ More

    Submitted 6 August, 2023; originally announced August 2023.

    Comments: 15 pages, 19 figures. Accepted by SIGGRAPH 2023. Project page: https://zhenwwang.github.io/langrecol

  34. arXiv:2307.10664  [pdf, other

    cs.CV cs.GR

    Lighting up NeRF via Unsupervised Decomposition and Enhancement

    Authors: Haoyuan Wang, Xiaogang Xu, Ke Xu, Rynson WH. Lau

    Abstract: Neural Radiance Field (NeRF) is a promising approach for synthesizing novel views, given a set of images and the corresponding camera poses of a scene. However, images photographed from a low-light scene can hardly be used to train a NeRF model to produce high-quality results, due to their low pixel intensities, heavy noise, and color distortion. Combining existing low-light image enhancement meth… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: ICCV 2023. Project website: https://whyy.site/paper/llnerf

  35. arXiv:2303.13511  [pdf, other

    cs.CV cs.AI cs.LG

    Neural Preset for Color Style Transfer

    Authors: Zhanghan Ke, Yuhao Liu, Lei Zhu, Nanxuan Zhao, Rynson W. H. Lau

    Abstract: In this paper, we present a Neural Preset technique to address the limitations of existing color style transfer methods, including visual artifacts, vast memory requirement, and slow style switching speed. Our method is based on two core designs. First, we propose Deterministic Neural Color Mapping (DNCM) to consistently operate on each pixel via an image-adaptive color mapping matrix, avoiding ar… ▽ More

    Submitted 24 March, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

    Comments: Project page with demos: https://zhkkke.github.io/NeuralPreset . Artifact-free real-time 4K color style transfer via AI-generated presets. CVPR 2023

  36. arXiv:2303.08810  [pdf, other

    cs.CV

    BiFormer: Vision Transformer with Bi-Level Routing Attention

    Authors: Lei Zhu, Xinjiang Wang, Zhanghan Ke, Wayne Zhang, Rynson Lau

    Abstract: As the core building block of vision transformers, attention is a powerful tool to capture long-range dependency. However, such power comes at a cost: it incurs a huge computation burden and heavy memory footprint as pairwise token interaction across all spatial locations is computed. A series of works attempt to alleviate this problem by introducing handcrafted and content-agnostic sparsity into… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: CVPR 2023 camera-ready

  37. arXiv:2301.03182  [pdf, other

    cs.CV

    Structure-Informed Shadow Removal Networks

    Authors: Yuhao Liu, Qing Guo, Lan Fu, Zhanghan Ke, Ke Xu, Wei Feng, Ivor W. Tsang, Rynson W. H. Lau

    Abstract: Existing deep learning-based shadow removal methods still produce images with shadow remnants. These shadow remnants typically exist in homogeneous regions with low-intensity values, making them untraceable in the existing image-to-image mapping paradigm. We observe that shadows mainly degrade images at the image-structure level (in which humans perceive object shapes and continuous colors). Hence… ▽ More

    Submitted 1 February, 2024; v1 submitted 9 January, 2023; originally announced January 2023.

    Comments: IEEE TIP

  38. arXiv:2211.15644  [pdf, other

    cs.CV

    Efficient Mirror Detection via Multi-level Heterogeneous Learning

    Authors: Ruozhen He, Jiaying Lin, Rynson W. H. Lau

    Abstract: We present HetNet (Multi-level \textbf{Het}erogeneous \textbf{Net}work), a highly efficient mirror detection network. Current mirror detection methods focus more on performance than efficiency, limiting the real-time applications (such as drones). Their lack of efficiency is aroused by the common design of adopting homogeneous modules at different levels, which ignores the difference between diffe… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

    Comments: Accepted to AAAI 2023. The code is available at https://github.com/Catherine-R-He/HetNet

  39. arXiv:2210.01055  [pdf, other

    cs.CV

    CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-training

    Authors: Tianyu Huang, Bowen Dong, Yunhan Yang, Xiaoshui Huang, Rynson W. H. Lau, Wanli Ouyang, Wangmeng Zuo

    Abstract: Pre-training across 3D vision and language remains under development because of limited training data. Recent works attempt to transfer vision-language pre-training models to 3D vision. PointCLIP converts point cloud data to multi-view depth maps, adopting CLIP for shape classification. However, its performance is restricted by the domain gap between rendered depth maps and images, as well as the… ▽ More

    Submitted 22 August, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: Accepted by ICCV2023

  40. Large-Field Contextual Feature Learning for Glass Detection

    Authors: Haiyang Mei, Xin Yang, Letian Yu, Qiang Zhang, Xiaopeng Wei, Rynson W. H. Lau

    Abstract: Glass is very common in our daily life. Existing computer vision systems neglect it and thus may have severe consequences, e.g., a robot may crash into a glass wall. However, sensing the presence of glass is not straightforward. The key challenge is that arbitrary objects/scenes can appear behind the glass. In this paper, we propose an important problem of detecting glass surfaces from a single RG… ▽ More

    Submitted 10 September, 2022; originally announced September 2022.

  41. Rain Removal from Light Field Images with 4D Convolution and Multi-scale Gaussian Process

    Authors: Tao Yan, Mingyue Li, Bin Li, Yang Yang, Rynson W. H. Lau

    Abstract: Existing deraining methods focus mainly on a single input image. However, with just a single input image, it is extremely difficult to accurately detect and remove rain streaks, in order to restore a rain-free image. In contrast, a light field image (LFI) embeds abundant 3D structure and texture information of the target scene by recording the direction and position of each incident ray via a plen… ▽ More

    Submitted 27 January, 2023; v1 submitted 16 August, 2022; originally announced August 2022.

    Comments: This paper has been published on IEEE Transactions on Image Processing

    Journal ref: IEEE Transactions on Image Processing (2023), v32, pages 921-936

  42. arXiv:2207.14083  [pdf, other

    cs.CV

    Weakly-Supervised Camouflaged Object Detection with Scribble Annotations

    Authors: Ruozhen He, Qihua Dong, Jiaying Lin, Rynson W. H. Lau

    Abstract: Existing camouflaged object detection (COD) methods rely heavily on large-scale datasets with pixel-wise annotations. However, due to the ambiguous boundary, annotating camouflage objects pixel-wisely is very time-consuming and labor-intensive, taking ~60mins to label one image. In this paper, we propose the first weakly-supervised COD method, using scribble annotations as supervision. To achieve… ▽ More

    Submitted 28 November, 2022; v1 submitted 28 July, 2022; originally announced July 2022.

    Comments: Accepted to AAAI 2023. The code and dataset are available at https://github.com/dddraxxx/Weakly-Supervised-Camouflaged-Object-Detection-with-Scribble-Annotations

  43. arXiv:2207.06332  [pdf, other

    cs.CV

    Symmetry-Aware Transformer-based Mirror Detection

    Authors: Tianyu Huang, Bowen Dong, Jiaying Lin, Xiaohui Liu, Rynson W. H. Lau, Wangmeng Zuo

    Abstract: Mirror detection aims to identify the mirror regions in the given input image. Existing works mainly focus on integrating the semantic features and structural features to mine specific relations between mirror and non-mirror regions, or introducing mirror properties like depth or chirality to help analyze the existence of mirrors. In this work, we observe that a real object typically forms a loose… ▽ More

    Submitted 4 September, 2022; v1 submitted 13 July, 2022; originally announced July 2022.

  44. arXiv:2207.01322  [pdf, other

    cs.CV

    Harmonizer: Learning to Perform White-Box Image and Video Harmonization

    Authors: Zhanghan Ke, Chunyi Sun, Lei Zhu, Ke Xu, Rynson W. H. Lau

    Abstract: Recent works on image harmonization solve the problem as a pixel-wise image translation task via large autoencoders. They have unsatisfactory performances and slow inference speeds when dealing with high-resolution images. In this work, we observe that adjusting the input arguments of basic image filters, e.g., brightness and contrast, is sufficient for humans to produce realistic images from the… ▽ More

    Submitted 20 July, 2022; v1 submitted 4 July, 2022; originally announced July 2022.

  45. arXiv:2206.11250  [pdf, other

    cs.CV

    Leveraging RGB-D Data with Cross-Modal Context Mining for Glass Surface Detection

    Authors: Jiaying Lin, Yuen-Hei Yeung, Shuquan Ye, Rynson W. H. Lau

    Abstract: Glass surfaces are becoming increasingly ubiquitous as modern buildings tend to use a lot of glass panels. This, however, poses substantial challenges to the operations of autonomous systems such as robots, self-driving cars, and drones, as these glass panels can become transparent obstacles to navigation. Existing works attempt to exploit various cues, including glass boundary context or reflecti… ▽ More

    Submitted 16 December, 2024; v1 submitted 22 June, 2022; originally announced June 2022.

    Comments: Accepted to AAAI 2025. Project Page: https://jiaying.link/AAAI25-RGBDGlass/

  46. arXiv:2203.17257  [pdf, other

    cs.CV

    Rethinking Video Salient Object Ranking

    Authors: Jiaying Lin, Huankang Guan, Rynson W. H. Lau

    Abstract: Salient Object Ranking (SOR) involves ranking the degree of saliency of multiple salient objects in an input image. Most recently, a method is proposed for ranking salient objects in an input video based on a predicted fixation map. It relies solely on the density of the fixations within the salient objects to infer their saliency ranks, which is incompatible with human perception of saliency rank… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

  47. arXiv:2203.09416  [pdf, other

    cs.CV

    Bi-directional Object-context Prioritization Learning for Saliency Ranking

    Authors: Xin Tian, Ke Xu, Xin Yang, Lin Du, Baocai Yin, Rynson W. H. Lau

    Abstract: The saliency ranking task is recently proposed to study the visual behavior that humans would typically shift their attention over different objects of a scene based on their degrees of saliency. Existing approaches focus on learning either object-object or object-scene relations. Such a strategy follows the idea of object-based attention in Psychology, but it tends to favor those objects with str… ▽ More

    Submitted 22 March, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR 2022

  48. arXiv:2112.02082  [pdf, other

    cs.CV

    Geometry-aware Two-scale PIFu Representation for Human Reconstruction

    Authors: Zheng Dong, Ke Xu, Ziheng Duan, Hujun Bao, Weiwei Xu, Rynson W. H. Lau

    Abstract: Although PIFu-based 3D human reconstruction methods are popular, the quality of recovered details is still unsatisfactory. In a sparse (e.g., 3 RGBD sensors) capture setting, the depth noise is typically amplified in the PIFu representation, resulting in flat facial surfaces and geometry-fallible bodies. In this paper, we propose a novel geometry-aware two-scale PIFu for 3D human reconstruction fr… ▽ More

    Submitted 27 September, 2022; v1 submitted 3 December, 2021; originally announced December 2021.

    Comments: Accepted by NeurIPS 2022. 20 pages, 20 figures

  49. arXiv:2111.10137  [pdf, other

    cs.CV

    Learning to Detect Instance-level Salient Objects Using Complementary Image Labels

    Authors: Xin Tian, Ke Xu, Xin Yang, Baocai Yin, Rynson W. H. Lau

    Abstract: Existing salient instance detection (SID) methods typically learn from pixel-level annotated datasets. In this paper, we present the first weakly-supervised approach to the SID problem. Although weak supervision has been considered in general saliency detection, it is mainly based on using class labels for object localization. However, it is non-trivial to use only class labels to learn instance-a… ▽ More

    Submitted 19 November, 2021; originally announced November 2021.

    Comments: to appear IJCV. arXiv admin note: text overlap with arXiv:2009.13898

  50. arXiv:2109.11818  [pdf, other

    cs.CV

    MODNet-V: Improving Portrait Video Matting via Background Restoration

    Authors: Jiayu Sun, Zhanghan Ke, Lihe Zhang, Huchuan Lu, Rynson W. H. Lau

    Abstract: To address the challenging portrait video matting problem more precisely, existing works typically apply some matting priors that require additional user efforts to obtain, such as annotated trimaps or background images. In this work, we observe that instead of asking the user to explicitly provide a background image, we may recover it from the input video itself. To this end, we first propose a n… ▽ More

    Submitted 24 September, 2021; originally announced September 2021.