Skip to main content

Showing 1–50 of 109 results for author: Kwong, S

.
  1. arXiv:2506.11823  [pdf, ps, other

    eess.IV cs.CV

    Structural Similarity-Inspired Unfolding for Lightweight Image Super-Resolution

    Authors: Zhangkai Ni, Yang Zhang, Wenhan Yang, Hanli Wang, Shiqi Wang, Sam Kwong

    Abstract: Major efforts in data-driven image super-resolution (SR) primarily focus on expanding the receptive field of the model to better capture contextual information. However, these methods are typically implemented by stacking deeper networks or leveraging transformer-based attention mechanisms, which consequently increases model complexity. In contrast, model-driven methods based on the unfolding para… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: Accepted to IEEE Transactions on Image Processing

  2. arXiv:2505.17666  [pdf, ps, other

    cs.CV

    Proto-FG3D: Prototype-based Interpretable Fine-Grained 3D Shape Classification

    Authors: Shuxian Ma, Zihao Dong, Runmin Cong, Sam Kwong, Xiuli Shao

    Abstract: Deep learning-based multi-view coarse-grained 3D shape classification has achieved remarkable success over the past decade, leveraging the powerful feature learning capabilities of CNN-based and ViT-based backbones. However, as a challenging research area critical for detailed shape understanding, fine-grained 3D classification remains understudied due to the limited discriminative information cap… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: 11 pages, 2 figures, 5 tablets; Submitted to BMVC2025

    ACM Class: I.4.0; I.5.0

  3. arXiv:2505.15581  [pdf, other

    cs.CV cs.AI

    UWSAM: Segment Anything Model Guided Underwater Instance Segmentation and A Large-scale Benchmark Dataset

    Authors: Hua Li, Shijie Lian, Zhiyuan Li, Runmin Cong, Sam Kwong

    Abstract: With recent breakthroughs in large-scale modeling, the Segment Anything Model (SAM) has demonstrated significant potential in a variety of visual applications. However, due to the lack of underwater domain expertise, SAM and its variants face performance limitations in end-to-end underwater instance segmentation tasks, while their higher computational requirements further hinder their application… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  4. arXiv:2505.02486  [pdf, ps, other

    cs.LG cs.AI

    SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning

    Authors: Jinpeng Chen, Runmin Cong, Yuzhi Zhao, Hongzheng Yang, Guangneng Hu, Horace Ho Shing Ip, Sam Kwong

    Abstract: Multimodal Continual Instruction Tuning (MCIT) aims to enable Multimodal Large Language Models (MLLMs) to incrementally learn new tasks without catastrophic forgetting. In this paper, we explore forgetting in this context, categorizing it into superficial forgetting and essential forgetting. Superficial forgetting refers to cases where the model's knowledge may not be genuinely lost, but its respo… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  5. arXiv:2504.14472  [pdf, ps, other

    math.DG math.AG

    Singular Lagrangians in the Hitchin moduli space and conformal limits

    Authors: Szehong Kwong

    Abstract: In the moduli space of semistable $\text{SL}(r, \mathbb{C})$-Higgs bundles, we show that there exists a sublocus of the upward flow through a polystable $\mathbb{C}^{*}$-fixed point, which is Lagrangian on its intersection with the stable locus. This intesesction is always non-empty in the case when the Higgs field of the fixed point vanishes, or when the automorphism group of its polystable repre… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

    MSC Class: 58D27 (Primary) 14D20; 14D21; 32G13 (Secondary)

  6. arXiv:2504.13022  [pdf, other

    cs.GR cs.CV

    CompGS++: Compressed Gaussian Splatting for Static and Dynamic Scene Representation

    Authors: Xiangrui Liu, Xinju Wu, Shiqi Wang, Zhu Li, Sam Kwong

    Abstract: Gaussian splatting demonstrates proficiency for 3D scene modeling but suffers from substantial data volume due to inherent primitive redundancy. To enable future photorealistic 3D immersive visual communication applications, significant compression is essential for transmission over the existing Internet infrastructure. Hence, we propose Compressed Gaussian Splatting (CompGS++), a novel framework… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Submitted to a journal

  7. arXiv:2504.03718  [pdf, other

    cs.LG cs.AI

    Task-Aware Parameter-Efficient Fine-Tuning of Large Pre-Trained Models at the Edge

    Authors: Senkang Hu, Yanan Ma, Yihang Tao, Zhengru Fang, Zihan Fang, Yiqin Deng, Sam Kwong, Yuguang Fang

    Abstract: Large language models (LLMs) have achieved remarkable success in various tasks, such as decision-making, reasoning, and question answering. They have been widely used in edge devices. However, fine-tuning LLMs to specific tasks at the edge is challenging due to the high computational cost and the limited storage and energy resources at the edge. To address this issue, we propose TaskEdge, a task-a… ▽ More

    Submitted 29 March, 2025; originally announced April 2025.

  8. arXiv:2503.00047  [pdf, other

    eess.IV cs.CV eess.SP

    PCE-GAN: A Generative Adversarial Network for Point Cloud Attribute Quality Enhancement based on Optimal Transport

    Authors: Tian Guo, Hui Yuan, Qi Liu, Honglei Su, Raouf Hamzaoui, Sam Kwong

    Abstract: Point cloud compression significantly reduces data volume but sacrifices reconstruction quality, highlighting the need for advanced quality enhancement techniques. Most existing approaches focus primarily on point-to-point fidelity, often neglecting the importance of perceptual quality as interpreted by the human visual system. To address this issue, we propose a generative adversarial network for… ▽ More

    Submitted 26 February, 2025; originally announced March 2025.

  9. arXiv:2502.15174  [pdf, other

    eess.IV cs.CV

    FD-LSCIC: Frequency Decomposition-based Learned Screen Content Image Compression

    Authors: Shiqi Jiang, Hui Yuan, Shuai Li, Huanqiang Zeng, Sam Kwong

    Abstract: The learned image compression (LIC) methods have already surpassed traditional techniques in compressing natural scene (NS) images. However, directly applying these methods to screen content (SC) images, which possess distinct characteristics such as sharp edges, repetitive patterns, embedded text and graphics, yields suboptimal results. This paper addresses three key challenges in SC image compre… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

  10. arXiv:2502.07807  [pdf, other

    cs.CR cs.AI cs.CV cs.LG

    CP-Guard+: A New Paradigm for Malicious Agent Detection and Defense in Collaborative Perception

    Authors: Senkang Hu, Yihang Tao, Zihan Fang, Guowen Xu, Yiqin Deng, Sam Kwong, Yuguang Fang

    Abstract: Collaborative perception (CP) is a promising method for safe connected and autonomous driving, which enables multiple vehicles to share sensing information to enhance perception performance. However, compared with single-vehicle perception, the openness of a CP system makes it more vulnerable to malicious attacks that can inject malicious information to mislead the perception of an ego vehicle, re… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

  11. arXiv:2501.01481  [pdf, other

    eess.IV cs.CV

    Unleashing Correlation and Continuity for Hyperspectral Reconstruction from RGB Images

    Authors: Fuxiang Feng, Runmin Cong, Shoushui Wei, Yipeng Zhang, Jun Li, Sam Kwong, Wei Zhang

    Abstract: Reconstructing Hyperspectral Images (HSI) from RGB images can yield high spatial resolution HSI at a lower cost, demonstrating significant application potential. This paper reveals that local correlation and global continuity of the spectral characteristics are crucial for HSI reconstruction tasks. Therefore, we fully explore these inter-spectral relationships and propose a Correlation and Continu… ▽ More

    Submitted 2 January, 2025; originally announced January 2025.

  12. arXiv:2412.15847  [pdf, other

    eess.IV cs.CV

    Image Quality Assessment: Enhancing Perceptual Exploration and Interpretation with Collaborative Feature Refinement and Hausdorff distance

    Authors: Xuekai Wei, Junyu Zhang, Qinlin Hu, Mingliang Zhou\\Yong Feng, Weizhi Xian, Huayan Pu, Sam Kwong

    Abstract: Current full-reference image quality assessment (FR-IQA) methods often fuse features from reference and distorted images, overlooking that color and luminance distortions occur mainly at low frequencies, whereas edge and texture distortions occur at high frequencies. This work introduces a pioneering training-free FR-IQA method that accurately predicts image quality in alignment with the human vis… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

  13. arXiv:2412.15677  [pdf, other

    cs.CV cs.AI

    AI-generated Image Quality Assessment in Visual Communication

    Authors: Yu Tian, Yixuan Li, Baoliang Chen, Hanwei Zhu, Shiqi Wang, Sam Kwong

    Abstract: Assessing the quality of artificial intelligence-generated images (AIGIs) plays a crucial role in their application in real-world scenarios. However, traditional image quality assessment (IQA) algorithms primarily focus on low-level visual perception, while existing IQA works on AIGIs overemphasize the generated content itself, neglecting its effectiveness in real-world applications. To bridge thi… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

    Comments: AAAI-2025; Project page: https://github.com/ytian73/AIGI-VC

  14. arXiv:2412.12000  [pdf, other

    cs.AI

    CP-Guard: Malicious Agent Detection and Defense in Collaborative Bird's Eye View Perception

    Authors: Senkang Hu, Yihang Tao, Guowen Xu, Yiqin Deng, Xianhao Chen, Yuguang Fang, Sam Kwong

    Abstract: Collaborative Perception (CP) has shown a promising technique for autonomous driving, where multiple connected and autonomous vehicles (CAVs) share their perception information to enhance the overall perception performance and expand the perception range. However, in CP, ego CAV needs to receive messages from its collaborators, which makes it easy to be attacked by malicious agents. For example, a… ▽ More

    Submitted 23 May, 2025; v1 submitted 16 December, 2024; originally announced December 2024.

    Comments: Accepted by AAAI'25

  15. arXiv:2411.09308  [pdf, other

    eess.IV cs.CV

    DT-JRD: Deep Transformer based Just Recognizable Difference Prediction Model for Video Coding for Machines

    Authors: Junqi Liu, Yun Zhang, Xiaoqi Wang, Xu Long, Sam Kwong

    Abstract: Just Recognizable Difference (JRD) represents the minimum visual difference that is detectable by machine vision, which can be exploited to promote machine vision oriented visual signal processing. In this paper, we propose a Deep Transformer based JRD (DT-JRD) prediction model for Video Coding for Machines (VCM), where the accurately predicted JRD can be used reduce the coding bit rate while main… ▽ More

    Submitted 14 November, 2024; originally announced November 2024.

    Comments: Submitted to IEEE Transactions on Multimedia

  16. arXiv:2410.05577  [pdf, other

    cs.CV

    Underwater Object Detection in the Era of Artificial Intelligence: Current, Challenge, and Future

    Authors: Long Chen, Yuzhi Huang, Junyu Dong, Qi Xu, Sam Kwong, Huimin Lu, Huchuan Lu, Chongyi Li

    Abstract: Underwater object detection (UOD), aiming to identify and localise the objects in underwater images or videos, presents significant challenges due to the optical distortion, water turbidity, and changing illumination in underwater scenes. In recent years, artificial intelligence (AI) based methods, especially deep learning methods, have shown promising performance in UOD. To further facilitate fut… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

  17. arXiv:2409.11711  [pdf, other

    eess.IV cs.CV

    LFIC-DRASC: Deep Light Field Image Compression Using Disentangled Representation and Asymmetrical Strip Convolution

    Authors: Shiyu Feng, Yun Zhang, Linwei Zhu, Sam Kwong

    Abstract: Light-Field (LF) image is emerging 4D data of light rays that is capable of realistically presenting spatial and angular information of 3D scene. However, the large data volume of LF images becomes the most challenging issue in real-time processing, transmission, and storage. In this paper, we propose an end-to-end deep LF Image Compression method Using Disentangled Representation and Asymmetrical… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

  18. arXiv:2409.10293  [pdf, other

    eess.IV cs.CV

    SPAC: Sampling-based Progressive Attribute Compression for Dense Point Clouds

    Authors: Xiaolong Mao, Hui Yuan, Tian Guo, Shiqi Jiang, Raouf Hamzaoui, Sam Kwong

    Abstract: We propose an end-to-end attribute compression method for dense point clouds. The proposed method combines a frequency sampling module, an adaptive scale feature extraction module with geometry assistance, and a global hyperprior entropy model. The frequency sampling module uses a Hamming window and the Fast Fourier Transform to extract high-frequency components of the point cloud. The difference… ▽ More

    Submitted 16 September, 2024; originally announced September 2024.

    Comments: 136pages, 13 figures

  19. arXiv:2409.04123  [pdf, other

    eess.IV

    Feature Compression for Cloud-Edge Multimodal 3D Object Detection

    Authors: Chongzhen Tian, Zhengxin Li, Hui Yuan, Raouf Hamzaoui, Liquan Shen, Sam Kwong

    Abstract: Machine vision systems, which can efficiently manage extensive visual perception tasks, are becoming increasingly popular in industrial production and daily life. Due to the challenge of simultaneously obtaining accurate depth and texture information with a single sensor, multimodal data captured by cameras and LiDAR is commonly used to enhance performance. Additionally, cloud-edge cooperation has… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

  20. arXiv:2408.08093  [pdf, other

    cs.CV cs.MM

    When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding

    Authors: Pingping Zhang, Jinlong Li, Kecheng Chen, Meng Wang, Long Xu, Haoliang Li, Nicu Sebe, Sam Kwong, Shiqi Wang

    Abstract: Existing codecs are designed to eliminate intrinsic redundancies to create a compact representation for compression. However, strong external priors from Multimodal Large Language Models (MLLMs) have not been explicitly explored in video compression. Herein, we introduce a unified paradigm for Cross-Modality Video Coding (CMVC), which is a pioneering approach to explore multimodality representatio… ▽ More

    Submitted 14 February, 2025; v1 submitted 15 August, 2024; originally announced August 2024.

  21. arXiv:2408.03624  [pdf, other

    cs.CV

    AgentsCoMerge: Large Language Model Empowered Collaborative Decision Making for Ramp Merging

    Authors: Senkang Hu, Zhengru Fang, Zihan Fang, Yiqin Deng, Xianhao Chen, Yuguang Fang, Sam Kwong

    Abstract: Ramp merging is one of the bottlenecks in traffic systems, which commonly cause traffic congestion, accidents, and severe carbon emissions. In order to address this essential issue and enhance the safety and efficiency of connected and autonomous vehicles (CAVs) at multi-lane merging zones, we propose a novel collaborative decision-making framework, named AgentsCoMerge, to leverage large language… ▽ More

    Submitted 24 April, 2025; v1 submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE Transactions on Mobile Computing (TMC)

  22. arXiv:2408.02340  [pdf

    cs.NE

    A Landscape-Aware Differential Evolution for Multimodal Optimization Problems

    Authors: Guo-Yun Lin, Zong-Gan Chen, Chuanbin Liu, Yuncheng Jiang, Sam Kwong, Jun Zhang, Zhi-Hui Zhan

    Abstract: How to simultaneously locate multiple global peaks and achieve certain accuracy on the found peaks are two key challenges in solving multimodal optimization problems (MMOPs). In this paper, a landscape-aware differential evolution (LADE) algorithm is proposed for MMOPs, which utilizes landscape knowledge to maintain sufficient diversity and provide efficient search guidance. In detail, the landsca… ▽ More

    Submitted 25 February, 2025; v1 submitted 5 August, 2024; originally announced August 2024.

  23. arXiv:2407.16354  [pdf, other

    cs.CV cs.LG

    Strike a Balance in Continual Panoptic Segmentation

    Authors: Jinpeng Chen, Runmin Cong, Yuxuan Luo, Horace Ho Shing Ip, Sam Kwong

    Abstract: This study explores the emerging area of continual panoptic segmentation, highlighting three key balances. First, we introduce past-class backtrace distillation to balance the stability of existing knowledge with the adaptability to new information. This technique retraces the features associated with past classes based on the final label assignment results, performing knowledge distillation targe… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  24. arXiv:2406.06039  [pdf, other

    cs.CV

    Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset

    Authors: Shijie Lian, Ziyi Zhang, Hua Li, Wenjie Li, Laurence Tianruo Yang, Sam Kwong, Runmin Cong

    Abstract: With the breakthrough of large models, Segment Anything Model (SAM) and its extensions have been attempted to apply in diverse tasks of computer vision. Underwater salient instance segmentation is a foundational and vital step for various underwater vision tasks, which often suffer from low segmentation accuracy due to the complex underwater circumstances and the adaptive ability of models. Moreov… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted to ICML 2024, Code released at: https://github.com/LiamLian0727/USIS10K

  25. arXiv:2404.09458  [pdf, other

    cs.CV cs.GR

    CompGS: Efficient 3D Scene Representation via Compressed Gaussian Splatting

    Authors: Xiangrui Liu, Xinju Wu, Pingping Zhang, Shiqi Wang, Zhu Li, Sam Kwong

    Abstract: Gaussian splatting, renowned for its exceptional rendering quality and efficiency, has emerged as a prominent technique in 3D scene representation. However, the substantial data volume of Gaussian splatting impedes its practical utility in real-world applications. Herein, we propose an efficient 3D scene representation, named Compressed Gaussian Splatting (CompGS), which harnesses compact Gaussian… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Submitted to a conference

  26. arXiv:2403.07290  [pdf, other

    cs.CV

    Learning Hierarchical Color Guidance for Depth Map Super-Resolution

    Authors: Runmin Cong, Ronghui Sheng, Hao Wu, Yulan Guo, Yunchao Wei, Wangmeng Zuo, Yao Zhao, Sam Kwong

    Abstract: Color information is the most commonly used prior knowledge for depth map super-resolution (DSR), which can provide high-frequency boundary guidance for detail restoration. However, its role and functionality in DSR have not been fully developed. In this paper, we rethink the utilization of color information and propose a hierarchical color guidance network to achieve DSR. On the one hand, the low… ▽ More

    Submitted 7 December, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  27. arXiv:2401.01563  [pdf, other

    cs.NE

    Towards Multi-Objective High-Dimensional Feature Selection via Evolutionary Multitasking

    Authors: Yinglan Feng, Liang Feng, Songbai Liu, Sam Kwong, Kay Chen Tan

    Abstract: Evolutionary Multitasking (EMT) paradigm, an emerging research topic in evolutionary computation, has been successfully applied in solving high-dimensional feature selection (FS) problems recently. However, existing EMT-based FS methods suffer from several limitations, such as a single mode of multitask generation, conducting the same generic evolutionary search for all tasks, relying on implicit… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  28. arXiv:2312.09095  [pdf, other

    cs.CV

    ColNeRF: Collaboration for Generalizable Sparse Input Neural Radiance Field

    Authors: Zhangkai Ni, Peiqi Yang, Wenhan Yang, Hanli Wang, Lin Ma, Sam Kwong

    Abstract: Neural Radiance Fields (NeRF) have demonstrated impressive potential in synthesizing novel views from dense input, however, their effectiveness is challenged when dealing with sparse input. Existing approaches that incorporate additional depth or semantic supervision can alleviate this issue to an extent. However, the process of supervision collection is not only costly but also potentially inaccu… ▽ More

    Submitted 14 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

  29. arXiv:2311.16754  [pdf, other

    cs.CV cs.AI

    Towards Full-scene Domain Generalization in Multi-agent Collaborative Bird's Eye View Segmentation for Connected and Autonomous Driving

    Authors: Senkang Hu, Zhengru Fang, Yiqin Deng, Xianhao Chen, Yuguang Fang, Sam Kwong

    Abstract: Collaborative perception has recently gained significant attention in autonomous driving, improving perception quality by enabling the exchange of additional information among vehicles. However, deploying collaborative perception systems can lead to domain shifts due to diverse environmental conditions and data heterogeneity among connected and autonomous vehicles (CAVs). To address these challeng… ▽ More

    Submitted 24 November, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: Accepted by IEEE Transactions on Intelligent Transportation Systems (TITS)

  30. arXiv:2308.11627  [pdf, other

    eess.SP cs.AI cs.CV eess.IV eess.SY

    Non-Intrusive Electric Load Monitoring Approach Based on Current Feature Visualization for Smart Energy Management

    Authors: Yiwen Xu, Dengfeng Liu, Liangtao Huang, Zhiquan Lin, Tiesong Zhao, Sam Kwong

    Abstract: The state-of-the-art smart city has been calling for an economic but efficient energy management over large-scale network, especially for the electric power system. It is a critical issue to monitor, analyze and control electric loads of all users in system. In this paper, we employ the popular computer vision techniques of AI to design a non-invasive load monitoring method for smart electric ener… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

  31. arXiv:2308.08935  [pdf, other

    cs.CV

    SDDNet: Style-guided Dual-layer Disentanglement Network for Shadow Detection

    Authors: Runmin Cong, Yuchen Guan, Jinpeng Chen, Wei Zhang, Yao Zhao, Sam Kwong

    Abstract: Despite significant progress in shadow detection, current methods still struggle with the adverse impact of background color, which may lead to errors when shadows are present on complex backgrounds. Drawing inspiration from the human visual system, we treat the input shadow image as a composition of a background layer and a shadow layer, and design a Style-guided Dual-layer Disentanglement Networ… ▽ More

    Submitted 7 December, 2024; v1 submitted 17 August, 2023; originally announced August 2023.

    Comments: Accepted by ACM MM 2023

  32. arXiv:2308.08930  [pdf, other

    cs.CV

    Point-aware Interaction and CNN-induced Refinement Network for RGB-D Salient Object Detection

    Authors: Runmin Cong, Hongyu Liu, Chen Zhang, Wei Zhang, Feng Zheng, Ran Song, Sam Kwong

    Abstract: By integrating complementary information from RGB image and depth map, the ability of salient object detection (SOD) for complex and challenging scenes can be improved. In recent years, the important role of Convolutional Neural Networks (CNNs) in feature extraction and cross-modality interaction has been fully explored, but it is still insufficient in modeling global long-range dependencies of se… ▽ More

    Submitted 7 December, 2024; v1 submitted 17 August, 2023; originally announced August 2023.

    Comments: Accepted by ACM MM 2023

  33. arXiv:2306.12298  [pdf, other

    cs.CV cs.LG eess.IV

    StarVQA+: Co-training Space-Time Attention for Video Quality Assessment

    Authors: Fengchuang Xing, Yuan-Gen Wang, Weixuan Tang, Guopu Zhu, Sam Kwong

    Abstract: Self-attention based Transformer has achieved great success in many computer vision tasks. However, its application to video quality assessment (VQA) has not been satisfactory so far. Evaluating the quality of in-the-wild videos is challenging due to the unknown of pristine reference and shooting distortion. This paper presents a co-trained Space-Time Attention network for the VQA problem, termed… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

  34. arXiv:2306.08918  [pdf, other

    eess.IV cs.CV

    PUGAN: Physical Model-Guided Underwater Image Enhancement Using GAN with Dual-Discriminators

    Authors: Runmin Cong, Wenyu Yang, Wei Zhang, Chongyi Li, Chun-Le Guo, Qingming Huang, Sam Kwong

    Abstract: Due to the light absorption and scattering induced by the water medium, underwater images usually suffer from some degradation problems, such as low contrast, color distortion, and blurring details, which aggravate the difficulty of downstream underwater understanding tasks. Therefore, how to obtain clear and visually pleasant images has become a common concern of people, and the task of underwate… ▽ More

    Submitted 7 December, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: 8 pages, 4 figures, Accepted by IEEE Transactions on Image Processing 2023

  35. Geometric Prior Based Deep Human Point Cloud Geometry Compression

    Authors: Xinju Wu, Pingping Zhang, Meng Wang, Peilin Chen, Shiqi Wang, Sam Kwong

    Abstract: The emergence of digital avatars has raised an exponential increase in the demand for human point clouds with realistic and intricate details. The compression of such data becomes challenging with overwhelming data amounts comprising millions of points. Herein, we leverage the human geometric prior in geometry redundancy removal of point clouds, greatly promoting the compression performance. More… ▽ More

    Submitted 25 March, 2024; v1 submitted 2 May, 2023; originally announced May 2023.

    Comments: Accepted by TCSVT 2024

  36. arXiv:2212.12378  [pdf, other

    cs.CV

    Multi-Projection Fusion and Refinement Network for Salient Object Detection in 360° Omnidirectional Image

    Authors: Runmin Cong, Ke Huang, Jianjun Lei, Yao Zhao, Qingming Huang, Sam Kwong

    Abstract: Salient object detection (SOD) aims to determine the most visually attractive objects in an image. With the development of virtual reality technology, 360° omnidirectional image has been widely used, but the SOD task in 360° omnidirectional image is seldom studied due to its severe distortions and complex scenes. In this paper, we propose a Multi-Projection Fusion and Refinement Network (MPFR-Net)… ▽ More

    Submitted 23 December, 2022; originally announced December 2022.

    Comments: Accepted by IEEE Transactions on Neural Networks and Learning Systems 2022

  37. arXiv:2211.07891  [pdf, other

    cs.CV

    Feedback Chain Network For Hippocampus Segmentation

    Authors: Heyu Huang, Runmin Cong, Lianhe Yang, Ling Du, Cong Wang, Sam Kwong

    Abstract: The hippocampus plays a vital role in the diagnosis and treatment of many neurological disorders. Recent years, deep learning technology has made great progress in the field of medical image segmentation, and the performance of related tasks has been constantly refreshed. In this paper, we focus on the hippocampus segmentation task and propose a novel hierarchical feedback chain network. The feedb… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

    Comments: Accepted by ACM TOMM 2022

  38. arXiv:2210.05912  [pdf, other

    cs.CV

    PSNet: Parallel Symmetric Network for Video Salient Object Detection

    Authors: Runmin Cong, Weiyu Song, Jianjun Lei, Guanghui Yue, Yao Zhao, Sam Kwong

    Abstract: For the video salient object detection (VSOD) task, how to excavate the information from the appearance modality and the motion modality has always been a topic of great concern. The two-stream structure, including an RGB appearance stream and an optical flow motion stream, has been widely used as a typical pipeline for VSOD tasks, but the existing methods usually only use motion features to unidi… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: Accepted by IEEE Transactions on Emerging Topics in Computational Intelligence 2022, 13 pages, 8 figures

  39. arXiv:2210.04266  [pdf, other

    cs.CV

    Does Thermal Really Always Matter for RGB-T Salient Object Detection?

    Authors: Runmin Cong, Kepu Zhang, Chen Zhang, Feng Zheng, Yao Zhao, Qingming Huang, Sam Kwong

    Abstract: In recent years, RGB-T salient object detection (SOD) has attracted continuous attention, which makes it possible to identify salient objects in environments such as low light by introducing thermal image. However, most of the existing RGB-T SOD models focus on how to perform cross-modality feature fusion, ignoring whether thermal image is really always matter in SOD task. Starting from the defini… ▽ More

    Submitted 9 October, 2022; originally announced October 2022.

    Comments: Accepted by IEEE Trans. Multimedia 2022, 13 pages, 9 figures

  40. arXiv:2210.04158  [pdf, other

    eess.IV cs.CV

    HVS Revisited: A Comprehensive Video Quality Assessment Framework

    Authors: Ao-Xiang Zhang, Yuan-Gen Wang, Weixuan Tang, Leida Li, Sam Kwong

    Abstract: Video quality is a primary concern for video service providers. In recent years, the techniques of video quality assessment (VQA) based on deep convolutional neural networks (CNNs) have been developed rapidly. Although existing works attempt to introduce the knowledge of the human visual system (HVS) into VQA, there still exhibit limitations that prevent the full exploitation of HVS, including an… ▽ More

    Submitted 8 October, 2022; originally announced October 2022.

    Comments: 13 pages, 5 figures, Journal paper

  41. arXiv:2209.05856   

    cs.CV cs.LG

    Just Noticeable Difference Modeling for Face Recognition System

    Authors: Yu Tian, Zhangkai Ni, Baoliang Chen, Shurun Wang, Shiqi Wang, Hanli Wang, Sam Kwong

    Abstract: High-quality face images are required to guarantee the stability and reliability of automatic face recognition (FR) systems in surveillance and security scenarios. However, a massive amount of face data is usually compressed before being analyzed due to limitations on transmission or storage. The compressed images may lose the powerful identity information, resulting in the performance degradation… ▽ More

    Submitted 28 September, 2023; v1 submitted 13 September, 2022; originally announced September 2022.

    Comments: MegaFace dataset we used in the manuscript are no longer publicly available

  42. arXiv:2209.05321  [pdf, other

    cs.CV

    Deep Feature Statistics Mapping for Generalized Screen Content Image Quality Assessment

    Authors: Baoliang Chen, Hanwei Zhu, Lingyu Zhu, Shiqi Wang, Sam Kwong

    Abstract: The statistical regularities of natural images, referred to as natural scene statistics, play an important role in no-reference image quality assessment. However, it has been widely acknowledged that screen content images (SCIs), which are typically computer generated, do not hold such statistics. Here we make the first attempt to learn the statistics of SCIs, based upon which the quality of SCIs… ▽ More

    Submitted 21 April, 2024; v1 submitted 12 September, 2022; originally announced September 2022.

  43. arXiv:2209.02957  [pdf, other

    cs.CV

    A Weakly Supervised Learning Framework for Salient Object Detection via Hybrid Labels

    Authors: Runmin Cong, Qi Qin, Chen Zhang, Qiuping Jiang, Shiqi Wang, Yao Zhao, Sam Kwong

    Abstract: Fully-supervised salient object detection (SOD) methods have made great progress, but such methods often rely on a large number of pixel-level annotations, which are time-consuming and labour-intensive. In this paper, we focus on a new weakly-supervised SOD task under hybrid labels, where the supervision labels include a large number of coarse labels generated by the traditional unsupervised metho… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

    Comments: Accepted by IEEE Transactions on Circuits and Systems for Video Technology 2022

  44. arXiv:2209.02934  [pdf, other

    eess.IV cs.CV

    Boundary Guided Semantic Learning for Real-time COVID-19 Lung Infection Segmentation System

    Authors: Runmin Cong, Yumo Zhang, Ning Yang, Haisheng Li, Xueqi Zhang, Ruochen Li, Zewen Chen, Yao Zhao, Sam Kwong

    Abstract: The coronavirus disease 2019 (COVID-19) continues to have a negative impact on healthcare systems around the world, though the vaccines have been developed and national vaccination coverage rate is steadily increasing. At the current stage, automatically segmenting the lung infection area from CT images is essential for the diagnosis and treatment of COVID-19. Thanks to the development of deep lea… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

    Comments: Accepted by IEEE Transactions on Consumer Electronics 2022

  45. arXiv:2209.02285  [pdf, other

    cs.CV eess.IV

    High Dynamic Range Image Quality Assessment Based on Frequency Disparity

    Authors: Yue Liu, Zhangkai Ni, Shiqi Wang, Hanli Wang, Sam Kwong

    Abstract: In this paper, a novel and effective image quality assessment (IQA) algorithm based on frequency disparity for high dynamic range (HDR) images is proposed, termed as local-global frequency feature-based model (LGFM). Motivated by the assumption that the human visual system is highly adapted for extracting structural information and partial frequencies when perceiving the visual scene, the Gabor an… ▽ More

    Submitted 6 September, 2022; originally announced September 2022.

  46. arXiv:2208.10077  [pdf, other

    cs.CV cs.AI

    Identifying Auxiliary or Adversarial Tasks Using Necessary Condition Analysis for Adversarial Multi-task Video Understanding

    Authors: Stephen Su, Samuel Kwong, Qingyu Zhao, De-An Huang, Juan Carlos Niebles, Ehsan Adeli

    Abstract: There has been an increasing interest in multi-task learning for video understanding in recent years. In this work, we propose a generalized notion of multi-task learning by incorporating both auxiliary tasks that the model should perform well on and adversarial tasks that the model should not perform well on. We employ Necessary Condition Analysis (NCA) as a data-driven approach for deciding what… ▽ More

    Submitted 22 August, 2022; originally announced August 2022.

  47. arXiv:2208.08145  [pdf, other

    cs.CV

    Stereo Superpixel Segmentation Via Decoupled Dynamic Spatial-Embedding Fusion Network

    Authors: Hua Li, Junyan Liang, Ruiqi Wu, Runmin Cong, Junhui Wu, Sam Tak Wu Kwong

    Abstract: Stereo superpixel segmentation aims at grouping the discretizing pixels into perceptual regions through left and right views more collaboratively and efficiently. Existing superpixel segmentation algorithms mostly utilize color and spatial features as input, which may impose strong constraints on spatial information while utilizing the disparity information in terms of stereo image pairs. To allev… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

    Comments: 11 pages, 13 figures

  48. DeepWSD: Projecting Degradations in Perceptual Space to Wasserstein Distance in Deep Feature Space

    Authors: Xingran Liao, Baoliang Chen, Hanwei Zhu, Shiqi Wang, Mingliang Zhou, Sam Kwong

    Abstract: Existing deep learning-based full-reference IQA (FR-IQA) models usually predict the image quality in a deterministic way by explicitly comparing the features, gauging how severely distorted an image is by how far the corresponding feature lies from the space of the reference images. Herein, we look at this problem from a different viewpoint and propose to model the quality degradation in perceptua… ▽ More

    Submitted 4 August, 2022; originally announced August 2022.

    Comments: ACM Multimedia 2022 accepted thesis

  49. Consistent Quality Oriented Rate Control in HEVC via Balancing Intra and Inter Frame Coding

    Authors: Wei Gao, Qiuping Jiang, Ronggang Wang, Siwei Ma, Ge Li, Sam Kwong

    Abstract: Consistent quality oriented rate control in video coding has attracted much more attention. However, the existing efforts only focus on decreasing variations between every two adjacent frames, but neglect coding trade-off problem between intra and inter frames. In this paper, we deal with it from a new perspective, where intra frame quantization parameter (IQP) and rate control are optimized for b… ▽ More

    Submitted 29 July, 2022; originally announced August 2022.

    Comments: 10 pages

    Journal ref: in IEEE Transactions on Industrial Informatics, vol. 18, no. 3, pp. 1594-1604, March 2022

  50. arXiv:2207.08114  [pdf, other

    eess.IV cs.CV

    BCS-Net: Boundary, Context and Semantic for Automatic COVID-19 Lung Infection Segmentation from CT Images

    Authors: Runmin Cong, Haowei Yang, Qiuping Jiang, Wei Gao, Haisheng Li, Cong Wang, Yao Zhao, Sam Kwong

    Abstract: The spread of COVID-19 has brought a huge disaster to the world, and the automatic segmentation of infection regions can help doctors to make diagnosis quickly and reduce workload. However, there are several challenges for the accurate and complete segmentation, such as the scattered infection area distribution, complex background noises, and blurred segmentation boundaries. To this end, in this p… ▽ More

    Submitted 17 July, 2022; originally announced July 2022.

    Comments: Accepted by IEEE Transactions on Instrumentation and Measurement 2022, Code: https://github.com/rmcong/BCS-Net-TIM22