Skip to main content

Showing 1–8 of 8 results for author: Lau, K W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.09516  [pdf, other

    cs.SD cs.CV eess.AS

    FSSUAVL: A Discriminative Framework using Vision Models for Federated Self-Supervised Audio and Image Understanding

    Authors: Yasar Abbas Ur Rehman, Kin Wai Lau, Yuyang Xie, Ma Lan, JiaJun Shen

    Abstract: Recent studies have demonstrated that vision models can effectively learn multimodal audio-image representations when paired. However, the challenge of enabling deep models to learn representations from unpaired modalities remains unresolved. This issue is especially pertinent in scenarios like Federated Learning (FL), where data is often decentralized, heterogeneous, and lacks a reliable guarante… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

    Comments: 8 pages

  2. arXiv:2409.15898  [pdf, other

    cs.LG cs.CV cs.DC

    FedRepOpt: Gradient Re-parametrized Optimizers in Federated Learning

    Authors: Kin Wai Lau, Yasar Abbas Ur Rehman, Pedro Porto Buarque de Gusmão, Lai-Man Po, Lan Ma, Yuyang Xie

    Abstract: Federated Learning (FL) has emerged as a privacy-preserving method for training machine learning models in a distributed manner on edge devices. However, on-device models face inherent computational power and memory limitations, potentially resulting in constrained gradient updates. As the model's size increases, the frequency of gradient updates on edge devices decreases, ultimately leading to su… ▽ More

    Submitted 10 October, 2024; v1 submitted 24 September, 2024; originally announced September 2024.

    Comments: Accepted by ACCV 2024

  3. arXiv:2404.13551  [pdf, other

    cs.SD eess.AS

    AudioRepInceptionNeXt: A lightweight single-stream architecture for efficient audio recognition

    Authors: Kin Wai Lau, Yasar Abbas Ur Rehman, Lai-Man Po

    Abstract: Recent research has successfully adapted vision-based convolutional neural network (CNN) architectures for audio recognition tasks using Mel-Spectrograms. However, these CNNs have high computational costs and memory requirements, limiting their deployment on low-end edge devices. Motivated by the success of efficient vision models like InceptionNeXt and ConvNeXt, we propose AudioRepInceptionNeXt,… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  4. arXiv:2402.02889  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    Exploring Federated Self-Supervised Learning for General Purpose Audio Understanding

    Authors: Yasar Abbas Ur Rehman, Kin Wai Lau, Yuyang Xie, Lan Ma, Jiajun Shen

    Abstract: The integration of Federated Learning (FL) and Self-supervised Learning (SSL) offers a unique and synergetic combination to exploit the audio data for general-purpose audio understanding, without compromising user data privacy. However, rare efforts have been made to investigate the SSL models in the FL regime for general-purpose audio understanding, especially when the training data is generated… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  5. arXiv:2310.16587  [pdf, other

    cs.LG cs.AI cs.CV

    Adaptive Uncertainty Estimation via High-Dimensional Testing on Latent Representations

    Authors: Tsai Hor Chan, Kin Wai Lau, Jiajun Shen, Guosheng Yin, Lequan Yu

    Abstract: Uncertainty estimation aims to evaluate the confidence of a trained deep neural network. However, existing uncertainty estimation approaches rely on low-dimensional distributional assumptions and thus suffer from the high dimensionality of latent features. Existing approaches tend to focus on uncertainty on discrete classification probabilities, which leads to poor generalizability to uncertainty… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  6. arXiv:2309.01439  [pdf, other

    cs.CV

    Large Separable Kernel Attention: Rethinking the Large Kernel Attention Design in CNN

    Authors: Kin Wai Lau, Lai-Man Po, Yasar Abbas Ur Rehman

    Abstract: Visual Attention Networks (VAN) with Large Kernel Attention (LKA) modules have been shown to provide remarkable performance, that surpasses Vision Transformers (ViTs), on a range of vision-based tasks. However, the depth-wise convolutional layer in these LKA modules incurs a quadratic increase in the computational and memory footprints with increasing convolutional kernel size. To mitigate these p… ▽ More

    Submitted 19 October, 2023; v1 submitted 4 September, 2023; originally announced September 2023.

  7. arXiv:2307.07265  [pdf, other

    cs.SD cs.AI eess.AS

    AudioInceptionNeXt: TCL AI LAB Submission to EPIC-SOUND Audio-Based-Interaction-Recognition Challenge 2023

    Authors: Kin Wai Lau, Yasar Abbas Ur Rehman, Yuyang Xie, Lan Ma

    Abstract: This report presents the technical details of our submission to the 2023 Epic-Kitchen EPIC-SOUNDS Audio-Based Interaction Recognition Challenge. The task is to learn the mapping from audio samples to their corresponding action labels. To achieve this goal, we propose a simple yet effective single-stream CNN-based architecture called AudioInceptionNeXt that operates on the time-frequency log-mel-sp… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

  8. arXiv:2008.07742  [pdf, other

    eess.IV cs.CV

    UDC 2020 Challenge on Image Restoration of Under-Display Camera: Methods and Results

    Authors: Yuqian Zhou, Michael Kwan, Kyle Tolentino, Neil Emerton, Sehoon Lim, Tim Large, Lijiang Fu, Zhihong Pan, Baopu Li, Qirui Yang, Yihao Liu, Jigang Tang, Tao Ku, Shibin Ma, Bingnan Hu, Jiarong Wang, Densen Puthussery, Hrishikesh P S, Melvin Kuriakose, Jiji C V, Varun Sundar, Sumanth Hegde, Divya Kothandaraman, Kaushik Mitra, Akashdeep Jassal , et al. (20 additional authors not shown)

    Abstract: This paper is the report of the first Under-Display Camera (UDC) image restoration challenge in conjunction with the RLQ workshop at ECCV 2020. The challenge is based on a newly-collected database of Under-Display Camera. The challenge tracks correspond to two types of display: a 4k Transparent OLED (T-OLED) and a phone Pentile OLED (P-OLED). Along with about 150 teams registered the challenge, ei… ▽ More

    Submitted 18 August, 2020; originally announced August 2020.

    Comments: 15 pages