Skip to main content

Showing 1–50 of 92 results for author: Wong, K K

.
  1. arXiv:2506.07986  [pdf, ps, other

    cs.CV

    Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

    Authors: Zhengyao Lv, Tianlin Pan, Chenyang Si, Zhaoxi Chen, Wangmeng Zuo, Ziwei Liu, Kwan-Yee K. Wong

    Abstract: Multimodal Diffusion Transformers (MM-DiTs) have achieved remarkable progress in text-driven visual generation. However, even state-of-the-art MM-DiT models like FLUX struggle with achieving precise alignment between text prompts and generated content. We identify two key issues in the attention mechanism of MM-DiT, namely 1) the suppression of cross-modal attention due to token imbalance between… ▽ More

    Submitted 11 June, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

    Comments: Project Page: https://vchitect.github.io/TACA/

  2. arXiv:2506.03123  [pdf, ps, other

    cs.CV

    DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation

    Authors: Zhengyao Lv, Chenyang Si, Tianlin Pan, Zhaoxi Chen, Kwan-Yee K. Wong, Yu Qiao, Ziwei Liu

    Abstract: Diffusion Models have achieved remarkable results in video synthesis but require iterative denoising steps, leading to substantial computational overhead. Consistency Models have made significant progress in accelerating diffusion models. However, directly applying them to video diffusion models often results in severe degradation of temporal consistency and appearance details. In this paper, by a… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  3. arXiv:2504.19162  [pdf, other

    cs.CL cs.AI cs.LG

    SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning

    Authors: Jiaqi Chen, Bang Zhang, Ruotian Ma, Peisong Wang, Xiaodan Liang, Zhaopeng Tu, Xiaolong Li, Kwan-Yee K. Wong

    Abstract: Evaluating the step-by-step reliability of large language model (LLM) reasoning, such as Chain-of-Thought, remains challenging due to the difficulty and cost of obtaining high-quality step-level supervision. In this paper, we introduce Self-Play Critic (SPC), a novel approach where a critic model evolves its ability to assess reasoning steps through adversarial self-play games, eliminating the nee… ▽ More

    Submitted 17 May, 2025; v1 submitted 27 April, 2025; originally announced April 2025.

    Comments: Project webpage: https://chen-judge.github.io/SPC/

  4. arXiv:2503.07157  [pdf, other

    cs.CV

    MIRAM: Masked Image Reconstruction Across Multiple Scales for Breast Lesion Risk Prediction

    Authors: Hung Q. Vo, Pengyu Yuan, Zheng Yin, Kelvin K. Wong, Chika F. Ezeana, Son T. Ly, Stephen T. C. Wong, Hien V. Nguyen

    Abstract: Self-supervised learning (SSL) has garnered substantial interest within the machine learning and computer vision communities. Two prominent approaches in SSL include contrastive-based learning and self-distillation utilizing cropping augmentation. Lately, masked image modeling (MIM) has emerged as a more potent SSL technique, employing image inpainting as a pretext task. MIM creates a strong induc… ▽ More

    Submitted 22 March, 2025; v1 submitted 10 March, 2025; originally announced March 2025.

  5. arXiv:2503.06759  [pdf, other

    cs.CV

    Revisiting Invariant Learning for Out-of-Domain Generalization on Multi-Site Mammogram Datasets

    Authors: Hung Q. Vo, Samira Zare, Son T. Ly, Lin Wang, Chika F. Ezeana, Xiaohui Yu, Kelvin K. Wong, Stephen T. C. Wong, Hien V. Nguyen

    Abstract: Despite significant progress in robust deep learning techniques for mammogram breast cancer classification, their reliability in real-world clinical development settings remains uncertain. The translation of these models to clinical practice faces challenges due to variations in medical centers, imaging protocols, and patient populations. To enhance their robustness, invariant learning methods hav… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

  6. arXiv:2501.12267  [pdf, other

    cs.CV

    VipDiff: Towards Coherent and Diverse Video Inpainting via Training-free Denoising Diffusion Models

    Authors: Chaohao Xie, Kai Han, Kwan-Yee K. Wong

    Abstract: Recent video inpainting methods have achieved encouraging improvements by leveraging optical flow to guide pixel propagation from reference frames either in the image space or feature space. However, they would produce severe artifacts in the mask center when the masked area is too large and no pixel correspondences can be found for the center. Recently, diffusion models have demonstrated impressi… ▽ More

    Submitted 21 January, 2025; originally announced January 2025.

    Comments: 10 pages, 5 Figures (Accepted at WACV 2025)

  7. arXiv:2501.08977  [pdf, other

    cs.AI

    Development and Validation of the Provider Documentation Summarization Quality Instrument for Large Language Models

    Authors: Emma Croxford, Yanjun Gao, Nicholas Pellegrino, Karen K. Wong, Graham Wills, Elliot First, Miranda Schnier, Kyle Burton, Cris G. Ebby, Jillian Gorskic, Matthew Kalscheur, Samy Khalil, Marie Pisani, Tyler Rubeor, Peter Stetson, Frank Liao, Cherodeep Goswami, Brian Patterson, Majid Afshar

    Abstract: As Large Language Models (LLMs) are integrated into electronic health record (EHR) workflows, validated instruments are essential to evaluate their performance before implementation. Existing instruments for provider documentation quality are often unsuitable for the complexities of LLM-generated text and lack validation on real-world data. The Provider Documentation Summarization Quality Instrume… ▽ More

    Submitted 17 January, 2025; v1 submitted 15 January, 2025; originally announced January 2025.

  8. arXiv:2412.20418  [pdf, other

    eess.IV cs.CV

    Diff4MMLiTS: Advanced Multimodal Liver Tumor Segmentation via Diffusion-Based Image Synthesis and Alignment

    Authors: Shiyun Chen, Li Lin, Pujin Cheng, ZhiCheng Jin, JianJian Chen, HaiDong Zhu, Kenneth K. Y. Wong, Xiaoying Tang

    Abstract: Multimodal learning has been demonstrated to enhance performance across various clinical tasks, owing to the diverse perspectives offered by different modalities of data. However, existing multimodal segmentation methods rely on well-registered multimodal data, which is unrealistic for real-world clinical images, particularly for indistinct and diffuse regions such as liver tumors. In this paper,… ▽ More

    Submitted 29 December, 2024; originally announced December 2024.

  9. arXiv:2410.21076  [pdf, other

    astro-ph.IM astro-ph.HE cs.LG gr-qc

    Accelerated Bayesian parameter estimation and model selection for gravitational waves with normalizing flows

    Authors: Alicja Polanska, Thibeau Wouters, Peter T. H. Pang, Kaze K. W. Wong, Jason D. McEwen

    Abstract: We present an accelerated pipeline, based on high-performance computing techniques and normalizing flows, for joint Bayesian parameter estimation and model selection and demonstrate its efficiency in gravitational wave astrophysics. We integrate the Jim inference toolkit, a normalizing flow-enhanced Markov chain Monte Carlo (MCMC) sampler, with the learned harmonic mean estimator. Our Bayesian evi… ▽ More

    Submitted 31 October, 2024; v1 submitted 28 October, 2024; originally announced October 2024.

    Comments: accepted to NeurIPS 2024 workshop on Machine Learning and the Physical Sciences

  10. arXiv:2410.19355  [pdf, other

    cs.CV

    FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality

    Authors: Zhengyao Lv, Chenyang Si, Junhao Song, Zhenyu Yang, Yu Qiao, Ziwei Liu, Kwan-Yee K. Wong

    Abstract: In this paper, we present \textbf{\textit{FasterCache}}, a novel training-free strategy designed to accelerate the inference of video diffusion models with high-quality generation. By analyzing existing cache-based methods, we observe that \textit{directly reusing adjacent-step features degrades video quality due to the loss of subtle variations}. We further perform a pioneering investigation of t… ▽ More

    Submitted 11 March, 2025; v1 submitted 25 October, 2024; originally announced October 2024.

  11. arXiv:2410.14672  [pdf, other

    cs.CV cs.AI

    BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities

    Authors: Shaozhe Hao, Xuantong Liu, Xianbiao Qi, Shihao Zhao, Bojia Zi, Rong Xiao, Kai Han, Kwan-Yee K. Wong

    Abstract: We introduce BiGR, a novel conditional image generation model using compact binary latent codes for generative training, focusing on enhancing both generation and representation capabilities. BiGR is the first conditional generative model that unifies generation and discrimination within the same framework. BiGR features a binary tokenizer, a masked modeling mechanism, and a binary transcoder for… ▽ More

    Submitted 5 January, 2025; v1 submitted 18 October, 2024; originally announced October 2024.

    Comments: Updated with additional T2I results; Project page: https://haoosz.github.io/BiGR

  12. arXiv:2410.07164  [pdf, other

    cs.CV

    AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation

    Authors: Yukang Cao, Liang Pan, Kai Han, Kwan-Yee K. Wong, Ziwei Liu

    Abstract: Recent advancements in diffusion models have led to significant improvements in the generation and animation of 4D full-body human-object interactions (HOI). Nevertheless, existing methods primarily focus on SMPL-based motion generation, which is limited by the scarcity of realistic large-scale interaction data. This constraint affects their ability to create everyday HOI scenes. This paper addres… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

    Comments: Project page: https://yukangcao.github.io/AvatarGO/

  13. arXiv:2409.18170  [pdf, other

    cs.CL cs.AI

    Evaluation of Large Language Models for Summarization Tasks in the Medical Domain: A Narrative Review

    Authors: Emma Croxford, Yanjun Gao, Nicholas Pellegrino, Karen K. Wong, Graham Wills, Elliot First, Frank J. Liao, Cherodeep Goswami, Brian Patterson, Majid Afshar

    Abstract: Large Language Models have advanced clinical Natural Language Generation, creating opportunities to manage the volume of medical text. However, the high-stakes nature of medicine requires reliable evaluation, which remains a challenge. In this narrative review, we assess the current evaluation state for clinical summarization tasks and propose future directions to address the resource constraints… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

  14. arXiv:2409.03745  [pdf, other

    cs.CV

    ArtiFade: Learning to Generate High-quality Subject from Blemished Images

    Authors: Shuya Yang, Shaozhe Hao, Yukang Cao, Kwan-Yee K. Wong

    Abstract: Subject-driven text-to-image generation has witnessed remarkable advancements in its ability to learn and capture characteristics of a subject using only a limited number of images. However, existing methods commonly rely on high-quality images for training and may struggle to generate reasonable images when the input images are blemished by artifacts. This is primarily attributed to the inadequat… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  15. arXiv:2407.07077  [pdf, other

    cs.CV cs.AI

    ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction

    Authors: Shaozhe Hao, Kai Han, Zhengyao Lv, Shihao Zhao, Kwan-Yee K. Wong

    Abstract: While personalized text-to-image generation has enabled the learning of a single concept from multiple images, a more practical yet challenging scenario involves learning multiple concepts within a single image. However, existing works tackling this scenario heavily rely on extensive human annotations. In this paper, we introduce a novel task named Unsupervised Concept Extraction (UCE) that consid… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: ECCV 2024, Project page: https://haoosz.github.io/ConceptExpress/

  16. arXiv:2407.05890  [pdf, other

    cs.RO cs.CL

    Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation

    Authors: Jiaqi Chen, Bingqian Lin, Xinmin Liu, Lin Ma, Xiaodan Liang, Kwan-Yee K. Wong

    Abstract: LLM-based agents have demonstrated impressive zero-shot performance in vision-language navigation (VLN) task. However, existing LLM-based methods often focus only on solving high-level task planning by selecting nodes in predefined navigation graphs for movements, overlooking low-level control in navigation scenarios. To bridge this gap, we propose AO-Planner, a novel Affordances-Oriented Planner… ▽ More

    Submitted 20 August, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  17. arXiv:2406.04253  [pdf, other

    cs.CV

    A Survey on 3D Human Avatar Modeling -- From Reconstruction to Generation

    Authors: Ruihe Wang, Yukang Cao, Kai Han, Kwan-Yee K. Wong

    Abstract: 3D modeling has long been an important area in computer vision and computer graphics. Recently, thanks to the breakthroughs in neural representations and generative models, we witnessed a rapid development of 3D modeling. 3D human modeling, lying at the core of many real-world applications, such as gaming and animation, has attracted significant attention. Over the past few years, a large body of… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 30 pages, 21 figures

  18. arXiv:2403.07860  [pdf, other

    cs.CV

    Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation

    Authors: Shihao Zhao, Shaozhe Hao, Bojia Zi, Huaizhe Xu, Kwan-Yee K. Wong

    Abstract: Text-to-image generation has made significant advancements with the introduction of text-to-image diffusion models. These models typically consist of a language model that interprets user prompts and a vision model that generates corresponding images. As language and vision models continue to progress in their respective domains, there is a great potential in exploring the replacement of component… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  19. arXiv:2403.02452  [pdf

    physics.optics physics.med-ph

    Programming the scalable optical learning operator with spatial-spectral optimization

    Authors: Yi Zhou, Jih-Liang Hsieh, Ilker Oguz, Mustafa Yildirim, Niyazi Ulas Dinc, Carlo Gigli, Kenneth K. Y. Wong, Christophe Moser, Demetri Psaltis

    Abstract: Electronic computers have evolved drastically over the past years with an ever-growing demand for improved performance. However, the transfer of information from memory and high energy consumption have emerged as issues that require solutions. Optical techniques are considered promising solutions to these problems with higher speed than their electronic counterparts and with reduced energy consump… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  20. arXiv:2403.01852  [pdf, other

    cs.CV

    PLACE: Adaptive Layout-Semantic Fusion for Semantic Image Synthesis

    Authors: Zhengyao Lv, Yuxiang Wei, Wangmeng Zuo, Kwan-Yee K. Wong

    Abstract: Recent advancements in large-scale pre-trained text-to-image models have led to remarkable progress in semantic image synthesis. Nevertheless, synthesizing high-quality images with consistent semantics and layout remains a challenge. In this paper, we propose the adaPtive LAyout-semantiC fusion modulE (PLACE) that harnesses pre-trained models to alleviate the aforementioned issues. Specifically, w… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  21. arXiv:2402.17502  [pdf, other

    cs.CV eess.IV

    FedLPPA: Learning Personalized Prompt and Aggregation for Federated Weakly-supervised Medical Image Segmentation

    Authors: Li Lin, Yixiang Liu, Jiewei Wu, Pujin Cheng, Zhiyuan Cai, Kenneth K. Y. Wong, Xiaoying Tang

    Abstract: Federated learning (FL) effectively mitigates the data silo challenge brought about by policies and privacy concerns, implicitly harnessing more data for deep model training. However, traditional centralized FL models grapple with diverse multi-center data, especially in the face of significant data heterogeneity, notably in medical contexts. In the realm of medical image segmentation, the growing… ▽ More

    Submitted 31 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: 12 pages, 10 figures

  22. arXiv:2401.14074  [pdf, other

    cs.CV cs.LG

    ProCNS: Progressive Prototype Calibration and Noise Suppression for Weakly-Supervised Medical Image Segmentation

    Authors: Y. Liu, L. Lin, K. K. Y. Wong, X. Tang

    Abstract: Weakly-supervised segmentation (WSS) has emerged as a solution to mitigate the conflict between annotation cost and model performance by adopting sparse annotation formats (e.g., point, scribble, block, etc.). Typical approaches attempt to exploit anatomy and topology priors to directly expand sparse annotations into pseudo-labels. However, due to a lack of attention to the ambiguous edges in medi… ▽ More

    Submitted 23 December, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

  23. arXiv:2401.07314  [pdf, other

    cs.AI cs.CV cs.RO

    MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation

    Authors: Jiaqi Chen, Bingqian Lin, Ran Xu, Zhenhua Chai, Xiaodan Liang, Kwan-Yee K. Wong

    Abstract: Embodied agents equipped with GPT as their brains have exhibited extraordinary decision-making and generalization abilities across various tasks. However, existing zero-shot agents for vision-and-language navigation (VLN) only prompt GPT-4 to select potential locations within localized environments, without constructing an effective "global-view" for the agent to understand the overall environment… ▽ More

    Submitted 20 June, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

    Comments: LLM/VLM-based VLN Agents. Accepted to ACL 2024. Project: https://chen-judge.github.io/MapGPT/

  24. arXiv:2311.13535  [pdf, other

    cs.CV

    DiffusionMat: Alpha Matting as Sequential Refinement Learning

    Authors: Yangyang Xu, Shengfeng He, Wenqi Shao, Kwan-Yee K. Wong, Yu Qiao, Ping Luo

    Abstract: In this paper, we introduce DiffusionMat, a novel image matting framework that employs a diffusion model for the transition from coarse to refined alpha mattes. Diverging from conventional methods that utilize trimaps merely as loose guidance for alpha matte prediction, our approach treats image matting as a sequential refinement learning process. This process begins with the addition of noise to… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

  25. arXiv:2310.03972  [pdf, ps, other

    math.GM

    Inequality and Nyman-Beurling-Baez-Duarte criteria

    Authors: Kwok Kwan Wong

    Abstract: We proposed a proof of the Riemann hypothesis. The proof is based on the Nyman-Beurling-Baez-Duarte condition. By proving existence of the solution for a system of inequalities, we can show that there is a sequence, which act as the coefficient of Beurling's sequence, can approximate the constant vector in a weighted Hilbert space.

    Submitted 7 November, 2023; v1 submitted 14 March, 2023; originally announced October 2023.

    Comments: 9 pages, version 1 is wrong. Version 2 fixed an argument in the proof. This is version 4

    MSC Class: 11Mxx; 46Cxx

  26. arXiv:2310.01412  [pdf, other

    cs.CV cs.RO

    DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model

    Authors: Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kwan-Yee. K. Wong, Zhenguo Li, Hengshuang Zhao

    Abstract: Multimodal large language models (MLLMs) have emerged as a prominent area of interest within the research community, given their proficiency in handling and reasoning with non-textual data, including images and videos. This study seeks to extend the application of MLLMs to the realm of autonomous driving by introducing DriveGPT4, a novel interpretable end-to-end autonomous driving system based on… ▽ More

    Submitted 8 November, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: Accepted by RA-L. The project page is available at https://tonyxuqaq.github.io/projects/DriveGPT4/

  27. arXiv:2308.09705  [pdf, other

    cs.CV

    Guide3D: Create 3D Avatars from Text and Image Guidance

    Authors: Yukang Cao, Yan-Pei Cao, Kai Han, Ying Shan, Kwan-Yee K. Wong

    Abstract: Recently, text-to-image generation has exhibited remarkable advancements, with the ability to produce visually impressive results. In contrast, text-to-3D generation has not yet reached a comparable level of quality. Existing methods primarily rely on text-guided score distillation sampling (SDS), and they encounter difficulties in transferring 2D attributes of the generated images to 3D content.… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: 25 pages, 22 figures

  28. arXiv:2308.08543  [pdf, other

    cs.CV cs.RO

    InsMapper: Exploring Inner-instance Information for Vectorized HD Mapping

    Authors: Zhenhua Xu, Kwan-Yee. K. Wong, Hengshuang Zhao

    Abstract: Vectorized high-definition (HD) maps contain detailed information about surrounding road elements, which are crucial for various downstream tasks in modern autonomous vehicles, such as motion planning and vehicle control. Recent works attempt to directly detect the vectorized HD map as a point set prediction task, achieving notable detection performance improvements. However, these methods usually… ▽ More

    Submitted 8 March, 2024; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: Code and demo will be available at https://tonyxuqaq.github.io/InsMapper/

  29. arXiv:2308.06097  [pdf, other

    cs.CV

    RIGID: Recurrent GAN Inversion and Editing of Real Face Videos

    Authors: Yangyang Xu, Shengfeng He, Kwan-Yee K. Wong, Ping Luo

    Abstract: GAN inversion is indispensable for applying the powerful editability of GAN to real images. However, existing methods invert video frames individually often leading to undesired inconsistent results over time. In this paper, we propose a unified recurrent framework, named \textbf{R}ecurrent v\textbf{I}deo \textbf{G}AN \textbf{I}nversion and e\textbf{D}iting (RIGID), to explicitly and simultaneousl… ▽ More

    Submitted 15 August, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

    Comments: ICCV2023

  30. VideoPro: A Visual Analytics Approach for Interactive Video Programming

    Authors: Jianben He, Xingbo Wang, Kam Kwai Wong, Xijie Huang, Changjian Chen, Zixin Chen, Fengjie Wang, Min Zhu, Huamin Qu

    Abstract: Constructing supervised machine learning models for real-world video analysis require substantial labeled data, which is costly to acquire due to scarce domain expertise and laborious manual inspection. While data programming shows promise in generating labeled data at scale with user-defined labeling functions, the high dimensional and complex temporal information in videos poses additional chall… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

    Comments: 11 pages, 7 figures

  31. Computational Approaches for Traditional Chinese Painting: From the "Six Principles of Painting" Perspective

    Authors: Wei Zhang, Jian-Wei Zhang, Kam Kwai Wong, Yifang Wang, Yingchaojie Feng, Luwei Wang, Wei Chen

    Abstract: Traditional Chinese Painting (TCP) is an invaluable cultural heritage resource and a unique visual art style. In recent years, increasing interest has been placed on digitalizing TCPs to preserve and revive the culture. The resulting digital copies have enabled the advancement of computational methods for structured and systematic understanding of TCPs. To explore this topic, we conducted an in-de… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Report number: 39(2): 269-285 Mar. 2024

    Journal ref: Journal of Computer Science and Technology.2024

  32. arXiv:2307.10281  [pdf, other

    cs.CV

    Semi-supervised Cycle-GAN for face photo-sketch translation in the wild

    Authors: Chaofeng Chen, Wei Liu, Xiao Tan, Kwan-Yee K. Wong

    Abstract: The performance of face photo-sketch translation has improved a lot thanks to deep neural networks. GAN based methods trained on paired images can produce high-quality results under laboratory settings. Such paired datasets are, however, often very small and lack diversity. Meanwhile, Cycle-GANs trained with unpaired photo-sketch datasets suffer from the \emph{steganography} phenomenon, which make… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

    Comments: 11 pages, 11 figures, 5 tables (+ 7 page appendix)

  33. PromptMagician: Interactive Prompt Engineering for Text-to-Image Creation

    Authors: Yingchaojie Feng, Xingbo Wang, Kam Kwai Wong, Sijia Wang, Yuhong Lu, Minfeng Zhu, Baicheng Wang, Wei Chen

    Abstract: Generative text-to-image models have gained great popularity among the public for their powerful capability to generate high-quality images based on natural language prompts. However, developing effective prompts for desired images can be challenging due to the complexity and ambiguity of natural language. This research proposes PromptMagician, a visual analysis system that helps users explore the… ▽ More

    Submitted 15 August, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

    Comments: Accepted full paper for IEEE VIS 2023

  34. arXiv:2306.03038  [pdf, other

    cs.CV

    HeadSculpt: Crafting 3D Head Avatars with Text

    Authors: Xiao Han, Yukang Cao, Kai Han, Xiatian Zhu, Jiankang Deng, Yi-Zhe Song, Tao Xiang, Kwan-Yee K. Wong

    Abstract: Recently, text-guided 3D generative methods have made remarkable advancements in producing high-quality textures and geometry, capitalizing on the proliferation of large vision-language and image diffusion models. However, existing methods still struggle to create high-fidelity 3D head avatars in two aspects: (1) They rely mostly on a pre-trained text-to-image diffusion model whilst missing the ne… ▽ More

    Submitted 29 August, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: Webpage: https://brandonhan.uk/HeadSculpt/

  35. arXiv:2306.00971  [pdf, other

    cs.CV cs.AI

    ViCo: Plug-and-play Visual Condition for Personalized Text-to-image Generation

    Authors: Shaozhe Hao, Kai Han, Shihao Zhao, Kwan-Yee K. Wong

    Abstract: Personalized text-to-image generation using diffusion models has recently emerged and garnered significant interest. This task learns a novel concept (e.g., a unique toy), illustrated in a handful of images, into a generative model that captures fine visual details and generates photorealistic images based on textual embeddings. In this paper, we present ViCo, a novel lightweight plug-and-play met… ▽ More

    Submitted 7 December, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Under review

  36. arXiv:2305.16322  [pdf, other

    cs.CV cs.GR

    Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models

    Authors: Shihao Zhao, Dongdong Chen, Yen-Chun Chen, Jianmin Bao, Shaozhe Hao, Lu Yuan, Kwan-Yee K. Wong

    Abstract: Text-to-Image diffusion models have made tremendous progress over the past two years, enabling the generation of highly realistic images based on open-domain text descriptions. However, despite their success, text descriptions often struggle to adequately convey detailed controls, even when composed of long and complex texts. Moreover, recent studies have also shown that these models face challeng… ▽ More

    Submitted 29 October, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Camera Ready, Code is available at https://github.com/ShihaoZhaoZSH/Uni-ControlNet

  37. arXiv:2304.13903  [pdf, other

    eess.SP

    On Propagation Characteristics of Reconfigurable Surface Wave Platform: Simulation and Experimental Verification

    Authors: Z. Chu, K. F. Tong, K. K. Wong, C. B. Chae, C. H. Chan

    Abstract: Reconfigurable intelligent surface (RIS) as a smart reflector is revolutionizing research for next-generation wireless communications. Complementing this is a concept of using RIS as an efficient propagation medium for potentially superior path loss characteristics. Motivated by a recent porous surface architecture that facilitates reconfigurable pathways with cavities filled with fluid metal, thi… ▽ More

    Submitted 2 August, 2023; v1 submitted 26 April, 2023; originally announced April 2023.

    Comments: Submitted to IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, 2023

  38. arXiv:2304.06928  [pdf, other

    cs.CV cs.AI

    CiPR: An Efficient Framework with Cross-instance Positive Relations for Generalized Category Discovery

    Authors: Shaozhe Hao, Kai Han, Kwan-Yee K. Wong

    Abstract: We tackle the issue of generalized category discovery (GCD). GCD considers the open-world problem of automatically clustering a partially labelled dataset, in which the unlabelled data may contain instances from both novel categories and labelled classes. In this paper, we address the GCD problem with an unknown category number for the unlabelled data. We propose a framework, named CiPR, to bootst… ▽ More

    Submitted 24 March, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

    Comments: Accepted to TMLR. Code: https://github.com/haoosz/CiPR

  39. arXiv:2304.05635  [pdf, other

    eess.IV cs.CV

    Unifying and Personalizing Weakly-supervised Federated Medical Image Segmentation via Adaptive Representation and Aggregation

    Authors: Li Lin, Jiewei Wu, Yixiang Liu, Kenneth K. Y. Wong, Xiaoying Tang

    Abstract: Federated learning (FL) enables multiple sites to collaboratively train powerful deep models without compromising data privacy and security. The statistical heterogeneity (e.g., non-IID data and domain shifts) is a primary obstacle in FL, impairing the generalization performance of the global model. Weakly supervised segmentation, which uses sparsely-grained (i.e., point-, bounding box-, scribble-… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: 13 pages, 7 figures

  40. arXiv:2304.05011  [pdf, other

    cs.HC cs.CL

    Towards an Understanding and Explanation for Mixed-Initiative Artificial Scientific Text Detection

    Authors: Luoxuan Weng, Minfeng Zhu, Kam Kwai Wong, Shi Liu, Jiashun Sun, Hang Zhu, Dongming Han, Wei Chen

    Abstract: Large language models (LLMs) have gained popularity in various fields for their exceptional capability of generating human-like text. Their potential misuse has raised social concerns about plagiarism in academic contexts. However, effective artificial scientific text detection is a non-trivial task due to several challenges, including 1) the lack of a clear understanding of the differences betwee… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  41. arXiv:2304.00916  [pdf, other

    cs.CV

    DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models

    Authors: Yukang Cao, Yan-Pei Cao, Kai Han, Ying Shan, Kwan-Yee K. Wong

    Abstract: We present DreamAvatar, a text-and-shape guided framework for generating high-quality 3D human avatars with controllable poses. While encouraging results have been reported by recent methods on text-guided 3D common object generation, generating high-quality human avatars remains an open challenge due to the complexity of the human body's shape, pose, and appearance. We propose DreamAvatar to tack… ▽ More

    Submitted 30 November, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

    Comments: Project page: https://yukangcao.github.io/DreamAvatar/

  42. arXiv:2304.00359  [pdf, other

    cs.CV

    SeSDF: Self-evolved Signed Distance Field for Implicit 3D Clothed Human Reconstruction

    Authors: Yukang Cao, Kai Han, Kwan-Yee K. Wong

    Abstract: We address the problem of clothed human reconstruction from a single image or uncalibrated multi-view images. Existing methods struggle with reconstructing detailed geometry of a clothed human and often require a calibrated setting for multi-view reconstruction. We propose a flexible framework which, by leveraging the parametric SMPL-X model, can take an arbitrary number of input images to reconst… ▽ More

    Submitted 1 April, 2023; originally announced April 2023.

    Comments: 25 pages, 21 figures

  43. arXiv:2303.15111  [pdf, other

    cs.CV cs.AI

    Learning Attention as Disentangler for Compositional Zero-shot Learning

    Authors: Shaozhe Hao, Kai Han, Kwan-Yee K. Wong

    Abstract: Compositional zero-shot learning (CZSL) aims at learning visual concepts (i.e., attributes and objects) from seen compositions and combining concept knowledge into unseen compositions. The key to CZSL is learning the disentanglement of the attribute-object composition. To this end, we propose to exploit cross-attentions as compositional disentanglers to learn disentangled concept embeddings. For e… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: CVPR 2023, available at https://haoosz.github.io/ade-czsl/

  44. arXiv:2302.09884  [pdf, other

    cs.CV

    GlocalFuse-Depth: Fusing Transformers and CNNs for All-day Self-supervised Monocular Depth Estimation

    Authors: Zezheng Zhang, Ryan K. Y. Chan, Kenneth K. Y. Wong

    Abstract: In recent years, self-supervised monocular depth estimation has drawn much attention since it frees of depth annotations and achieved remarkable results on standard benchmarks. However, most of existing methods only focus on either daytime or nighttime images, thus their performance degrades on the other domain because of the large domain shift between daytime and nighttime images. To address this… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

  45. Anchorage: Visual Analysis of Satisfaction in Customer Service Videos via Anchor Events

    Authors: Kam Kwai Wong, Xingbo Wang, Yong Wang, Jianben He, Rong Zhang, Huamin Qu

    Abstract: Delivering customer services through video communications has brought new opportunities to analyze customer satisfaction for quality management. However, due to the lack of reliable self-reported responses, service providers are troubled by the inadequate estimation of customer services and the tedious investigation into multimodal video recordings. We introduce Anchorage, a visual analytics syste… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

    Comments: 13 pages. A preprint version of a publication at IEEE Transactions on Visualization and Computer Graphics (TVCG), 2023

  46. arXiv:2302.01966  [pdf, other

    cs.HC

    Towards an Understanding of Distributed Asymmetric Collaborative Visualization on Problem-solving

    Authors: Wai Tong, Meng Xia, Kam Kwai Wong, Doug A. Bowman, Ting-Chuen Pong, Huamin Qu, Yalong Yang

    Abstract: This paper provided empirical knowledge of the user experience for using collaborative visualization in a distributed asymmetrical setting through controlled user studies. With the ability to access various computing devices, such as Virtual Reality (VR) head-mounted displays, scenarios emerge when collaborators have to or prefer to use different computing environments in different places. However… ▽ More

    Submitted 3 February, 2023; originally announced February 2023.

    Comments: 11 pages, 12 figures, accepted at IEEE VR 2023

  47. XNLI: Explaining and Diagnosing NLI-based Visual Data Analysis

    Authors: Yingchaojie Feng, Xingbo Wang, Bo Pan, Kam Kwai Wong, Yi Ren, Shi Liu, Zihan Yan, Yuxin Ma, Huamin Qu, Wei Chen

    Abstract: Natural language interfaces (NLIs) enable users to flexibly specify analytical intentions in data visualization. However, diagnosing the visualization results without understanding the underlying generation process is challenging. Our research explores how to provide explanations for NLIs to help users locate the problems and further revise the queries. We present XNLI, an explainable NLI system f… ▽ More

    Submitted 24 January, 2023; originally announced January 2023.

    Comments: 14 pages, 7 figures. A preprint version of a publication at IEEE Transactions on Visualization and Computer Graphics (TVCG), 2023

  48. arXiv:2212.05566  [pdf, other

    cs.CV eess.IV

    YoloCurvSeg: You Only Label One Noisy Skeleton for Vessel-style Curvilinear Structure Segmentation

    Authors: Li Lin, Linkai Peng, Huaqing He, Pujin Cheng, Jiewei Wu, Kenneth K. Y. Wong, Xiaoying Tang

    Abstract: Weakly-supervised learning (WSL) has been proposed to alleviate the conflict between data annotation cost and model performance through employing sparsely-grained (i.e., point-, box-, scribble-wise) supervision and has shown promising performance, particularly in the image segmentation field. However, it is still a very challenging task due to the limited supervision, especially when only a small… ▽ More

    Submitted 18 August, 2023; v1 submitted 11 December, 2022; originally announced December 2022.

    Comments: 20 pages, 15 figures, MEDIA accepted

  49. arXiv:2210.08936  [pdf, other

    cs.CV cs.AI

    S$^3$-NeRF: Neural Reflectance Field from Shading and Shadow under a Single Viewpoint

    Authors: Wenqi Yang, Guanying Chen, Chaofeng Chen, Zhenfang Chen, Kwan-Yee K. Wong

    Abstract: In this paper, we address the "dual problem" of multi-view scene reconstruction in which we utilize single-view images captured under different point lights to learn a neural scene representation. Different from existing single-view methods which can only recover a 2.5D scene representation (i.e., a normal / depth map for the visible surface), our method learns a neural reflectance field to repres… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022, Project page: https://ywq.github.io/s3nerf

  50. arXiv:2207.11406  [pdf, other

    cs.CV cs.AI

    PS-NeRF: Neural Inverse Rendering for Multi-view Photometric Stereo

    Authors: Wenqi Yang, Guanying Chen, Chaofeng Chen, Zhenfang Chen, Kwan-Yee K. Wong

    Abstract: Traditional multi-view photometric stereo (MVPS) methods are often composed of multiple disjoint stages, resulting in noticeable accumulated errors. In this paper, we present a neural inverse rendering method for MVPS based on implicit representation. Given multi-view images of a non-Lambertian object illuminated by multiple unknown directional lights, our method jointly estimates the geometry, ma… ▽ More

    Submitted 22 December, 2022; v1 submitted 22 July, 2022; originally announced July 2022.

    Comments: ECCV 2022, Project page: https://ywq.github.io/psnerf