Skip to main content

Showing 1–50 of 216 results for author: Lam, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.15294  [pdf, ps, other

    quant-ph cs.DS cs.ET math.OC

    Classical and Quantum Heuristics for the Binary Paint Shop Problem

    Authors: V Vijendran, Dax Enshan Koh, Ping Koy Lam, Syed M Assad

    Abstract: The Binary Paint Shop Problem (BPSP) is an $\mathsf{APX}$-hard optimisation problem in automotive manufacturing: given a sequence of $2n$ cars, comprising $n$ distinct models each appearing twice, the task is to decide which of two colours to paint each car so that the two occurrences of each model are painted differently, while minimising consecutive colour swaps. The key performance metric is th… ▽ More

    Submitted 18 September, 2025; originally announced September 2025.

    Comments: 30 Pages and 3 Figures

  2. arXiv:2509.14594  [pdf, ps, other

    cs.AI

    SynBench: A Benchmark for Differentially Private Text Generation

    Authors: Yidan Sun, Viktor Schlegel, Srinivasan Nandakumar, Iqra Zahid, Yuping Wu, Yulong Wu, Hao Li, Jie Zhang, Warren Del-Pinto, Goran Nenadic, Siew Kei Lam, Anil Anthony Bharath

    Abstract: Data-driven decision support in high-stakes domains like healthcare and finance faces significant barriers to data sharing due to regulatory, institutional, and privacy concerns. While recent generative AI models, such as large language models, have shown impressive performance in open-domain tasks, their adoption in sensitive environments remains limited by unpredictable behaviors and insufficien… ▽ More

    Submitted 17 September, 2025; originally announced September 2025.

    Comments: 15 pages

  3. Bona fide Cross Testing Reveals Weak Spot in Audio Deepfake Detection Systems

    Authors: Chin Yuen Kwok, Jia Qi Yip, Zhen Qiu, Chi Hung Chi, Kwok Yan Lam

    Abstract: Audio deepfake detection (ADD) models are commonly evaluated using datasets that combine multiple synthesizers, with performance reported as a single Equal Error Rate (EER). However, this approach disproportionately weights synthesizers with more samples, underrepresenting others and reducing the overall reliability of EER. Additionally, most ADD datasets lack diversity in bona fide speech, often… ▽ More

    Submitted 11 September, 2025; originally announced September 2025.

    Comments: Published in Interspeech 2025

  4. arXiv:2509.04139  [pdf

    cs.IR cs.AI

    Enhancing Technical Documents Retrieval for RAG

    Authors: Songjiang Lai, Tsun-Hin Cheung, Ka-Chun Fung, Kaiwen Xue, Kwan-Ho Lin, Yan-Ming Choi, Vincent Ng, Kin-Man Lam

    Abstract: In this paper, we introduce Technical-Embeddings, a novel framework designed to optimize semantic retrieval in technical documentation, with applications in both hardware and software development. Our approach addresses the challenges of understanding and retrieving complex technical content by leveraging the capabilities of Large Language Models (LLMs). First, we enhance user queries by generatin… ▽ More

    Submitted 4 September, 2025; originally announced September 2025.

  5. arXiv:2508.21080  [pdf, ps, other

    cs.CV cs.RO

    2COOOL: 2nd Workshop on the Challenge Of Out-Of-Label Hazards in Autonomous Driving

    Authors: Ali K. AlShami, Ryan Rabinowitz, Maged Shoman, Jianwu Fang, Lukas Picek, Shao-Yuan Lo, Steve Cruz, Khang Nhut Lam, Nachiket Kamod, Lei-Lei Li, Jugal Kalita, Terrance E. Boult

    Abstract: As the computer vision community advances autonomous driving algorithms, integrating vision-based insights with sensor data remains essential for improving perception, decision making, planning, prediction, simulation, and control. Yet we must ask: Why don't we have entirely safe self-driving cars yet? A key part of the answer lies in addressing novel scenarios, one of the most critical barriers t… ▽ More

    Submitted 18 August, 2025; originally announced August 2025.

    Comments: 11 pages, 2 figures, Accepted to ICCV 2025 Workshop on Out-of-Label Hazards in Autonomous Driving (2COOOL)

    MSC Class: 68T45 (Machine vision and scene understanding) ACM Class: I.2.10; I.4.8

  6. arXiv:2508.21004  [pdf, ps, other

    cs.CL

    Lethe: Purifying Backdoored Large Language Models with Knowledge Dilution

    Authors: Chen Chen, Yuchen Sun, Jiaxin Gao, Xueluan Gong, Qian Wang, Ziyao Wang, Yongsen Zheng, Kwok-Yan Lam

    Abstract: Large language models (LLMs) have seen significant advancements, achieving superior performance in various Natural Language Processing (NLP) tasks. However, they remain vulnerable to backdoor attacks, where models behave normally for standard queries but generate harmful responses or unintended output when specific triggers are activated. Existing backdoor defenses either lack comprehensiveness, f… ▽ More

    Submitted 28 August, 2025; originally announced August 2025.

  7. arXiv:2508.18845  [pdf, ps, other

    cs.IT

    On decoding extended Han-Zhang codes

    Authors: Yang Li, Zhenliang Lu, San Ling, Shixin Zhu, Kwok Yan Lam

    Abstract: Extended Han-Zhang codes are a class of linear codes where each code is either a non-generalized Reed-Solomon (non-GRS) maximum distance separable (MDS) code or a near MDS (NMDS) code. They have important applications in communication, cryptography, and storage systems. While many algebraic properties and explicit constructions of extended Han-Zhang codes have been well studied in the literature,… ▽ More

    Submitted 26 August, 2025; originally announced August 2025.

    Comments: An extension of part of the results in arXiv:2401.04360v2

  8. arXiv:2508.15457  [pdf, ps, other

    cs.CV

    Enhancing Novel View Synthesis from extremely sparse views with SfM-free 3D Gaussian Splatting Framework

    Authors: Zongqi He, Hanmin Li, Kin-Chung Chan, Yushen Zuo, Hao Xie, Zhe Xiao, Jun Xiao, Kin-Man Lam

    Abstract: 3D Gaussian Splatting (3DGS) has demonstrated remarkable real-time performance in novel view synthesis, yet its effectiveness relies heavily on dense multi-view inputs with precisely known camera poses, which are rarely available in real-world scenarios. When input views become extremely sparse, the Structure-from-Motion (SfM) method that 3DGS depends on for initialization fails to accurately reco… ▽ More

    Submitted 21 August, 2025; originally announced August 2025.

    Comments: 13 pages, 4 figures

  9. arXiv:2508.04745  [pdf, ps, other

    cs.LG

    Edge-Assisted Collaborative Fine-Tuning for Multi-User Personalized Artificial Intelligence Generated Content (AIGC)

    Authors: Nan Li, Wanting Yang, Marie Siew, Zehui Xiong, Binbin Chen, Shiwen Mao, Kwok-Yan Lam

    Abstract: Diffusion models (DMs) have emerged as powerful tools for high-quality content generation, yet their intensive computational requirements for inference pose challenges for resource-constrained edge devices. Cloud-based solutions aid in computation but often fall short in addressing privacy risks, personalization efficiency, and communication costs in multi-user edge-AIGC scenarios. To bridge this… ▽ More

    Submitted 6 August, 2025; originally announced August 2025.

  10. arXiv:2508.00537  [pdf, ps, other

    cs.CL

    The Prosody of Emojis

    Authors: Giulio Zhou, Tsz Kin Lam, Alexandra Birch, Barry Haddow

    Abstract: Prosodic features such as pitch, timing, and intonation are central to spoken communication, conveying emotion, intent, and discourse structure. In text-based settings, where these cues are absent, emojis act as visual surrogates that add affective and pragmatic nuance. This study examines how emojis influence prosodic realisation in speech and how listeners interpret prosodic cues to recover emoj… ▽ More

    Submitted 1 August, 2025; originally announced August 2025.

  11. arXiv:2507.20720  [pdf

    cs.HC

    Beyond Text: Probing K-12 Educators' Perspectives and Ideas for Learning Opportunities Leveraging Multimodal Large Language Models

    Authors: Tiffany Tseng, Katelyn Lam, Tiffany Lin Fu, Alekhya Maram

    Abstract: Multimodal Large Language Models (MLLMs) are beginning to empower new user experiences that can flexibly generate content from a range of inputs, including images, text, speech, and video. These capabilities have the potential to enrich learning by enabling users to capture and interact with information using a variety of modalities, but little is known about how educators envision how MLLMs might… ▽ More

    Submitted 28 July, 2025; originally announced July 2025.

  12. arXiv:2507.18064  [pdf, ps, other

    cs.CV

    Adapting Large VLMs with Iterative and Manual Instructions for Generative Low-light Enhancement

    Authors: Xiaoran Sun, Liyan Wang, Cong Wang, Yeying Jin, Kin-man Lam, Zhixun Su, Yang Yang, Jinshan Pan

    Abstract: Most existing low-light image enhancement (LLIE) methods rely on pre-trained model priors, low-light inputs, or both, while neglecting the semantic guidance available from normal-light images. This limitation hinders their effectiveness in complex lighting conditions. In this paper, we propose VLM-IMI, a novel framework that leverages large vision-language models (VLMs) with iterative and manual i… ▽ More

    Submitted 23 July, 2025; originally announced July 2025.

  13. arXiv:2507.05994  [pdf, ps, other

    q-fin.PM cs.IT cs.LG q-fin.CP

    Beating the Best Constant Rebalancing Portfolio in Long-Term Investment: A Generalization of the Kelly Criterion and Universal Learning Algorithm for Markets with Serial Dependence

    Authors: Duy Khanh Lam

    Abstract: In the online portfolio optimization framework, existing learning algorithms generate strategies that yield significantly poorer cumulative wealth compared to the best constant rebalancing portfolio in hindsight, despite being consistent in asymptotic growth rate. While this unappealing performance can be improved by incorporating more side information, it raises difficulties in feature selection… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

    Comments: 19 pages, 7 figures. Working paper (1st full draft); typos may exist

  14. arXiv:2507.02000  [pdf, ps, other

    cs.IR cs.CL cs.MM

    Why Multi-Interest Fairness Matters: Hypergraph Contrastive Multi-Interest Learning for Fair Conversational Recommender System

    Authors: Yongsen Zheng, Zongxuan Xie, Guohua Wang, Ziyao Liu, Liang Lin, Kwok-Yan Lam

    Abstract: Unfairness is a well-known challenge in Recommender Systems (RSs), often resulting in biased outcomes that disadvantage users or items based on attributes such as gender, race, age, or popularity. Although some approaches have started to improve fairness recommendation in offline or static contexts, the issue of unfairness often exacerbates over time, leading to significant problems like the Matth… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  15. arXiv:2506.22963  [pdf, ps, other

    stat.ML cs.LG q-bio.GN

    CN-SBM: Categorical Block Modelling For Primary and Residual Copy Number Variation

    Authors: Kevin Lam, William Daniels, J Maxwell Douglas, Daniel Lai, Samuel Aparicio, Benjamin Bloem-Reddy, Yongjin Park

    Abstract: Cancer is a genetic disorder whose clonal evolution can be monitored by tracking noisy genome-wide copy number variants. We introduce the Copy Number Stochastic Block Model (CN-SBM), a probabilistic framework that jointly clusters samples and genomic regions based on discrete copy number states using a bipartite categorical block model. Unlike models relying on Gaussian or Poisson assumptions, CN-… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

    Comments: 8 pages, 4 figures

  16. arXiv:2506.20702  [pdf

    cs.AI cs.CY

    The Singapore Consensus on Global AI Safety Research Priorities

    Authors: Yoshua Bengio, Tegan Maharaj, Luke Ong, Stuart Russell, Dawn Song, Max Tegmark, Lan Xue, Ya-Qin Zhang, Stephen Casper, Wan Sie Lee, Sören Mindermann, Vanessa Wilfred, Vidhisha Balachandran, Fazl Barez, Michael Belinsky, Imane Bello, Malo Bourgon, Mark Brakel, Siméon Campos, Duncan Cass-Beggs, Jiahao Chen, Rumman Chowdhury, Kuan Chua Seah, Jeff Clune, Juntao Dai , et al. (63 additional authors not shown)

    Abstract: Rapidly improving AI capabilities and autonomy hold significant promise of transformation, but are also driving vigorous debate on how to ensure that AI is safe, i.e., trustworthy, reliable, and secure. Building a trusted ecosystem is therefore essential -- it helps people embrace AI with confidence and gives maximal space for innovation while avoiding backlash. The "2025 Singapore Conference on… ▽ More

    Submitted 30 June, 2025; v1 submitted 25 June, 2025; originally announced June 2025.

    Comments: Final report from the "2025 Singapore Conference on AI (SCAI)" held April 26: https://www.scai.gov.sg/2025/scai2025-report

  17. arXiv:2506.09443  [pdf, ps, other

    cs.CR

    LLMs Cannot Reliably Judge (Yet?): A Comprehensive Assessment on the Robustness of LLM-as-a-Judge

    Authors: Songze Li, Chuokun Xu, Jiaying Wang, Xueluan Gong, Chen Chen, Jirui Zhang, Jun Wang, Kwok-Yan Lam, Shouling Ji

    Abstract: Large Language Models (LLMs) have demonstrated remarkable intelligence across various tasks, which has inspired the development and widespread adoption of LLM-as-a-Judge systems for automated model testing, such as red teaming and benchmarking. However, these systems are susceptible to adversarial attacks that can manipulate evaluation outcomes, raising concerns about their robustness and, consequ… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  18. arXiv:2506.07687  [pdf, ps, other

    stat.ML cs.LG

    Rao-Blackwellised Reparameterisation Gradients

    Authors: Kevin Lam, Thang Bui, George Deligiannidis, Yee Whye Teh

    Abstract: Latent Gaussian variables have been popularised in probabilistic machine learning. In turn, gradient estimators are the machinery that facilitates gradient-based optimisation for models with latent Gaussian variables. The reparameterisation trick is often used as the default estimator as it is simple to implement and yields low-variance gradients for variational inference. In this work, we propose… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  19. arXiv:2505.16161  [pdf, other

    cs.CV

    Deep Learning-Driven Ultra-High-Definition Image Restoration: A Survey

    Authors: Liyan Wang, Weixiang Zhou, Cong Wang, Kin-Man Lam, Zhixun Su, Jinshan Pan

    Abstract: Ultra-high-definition (UHD) image restoration aims to specifically solve the problem of quality degradation in ultra-high-resolution images. Recent advancements in this field are predominantly driven by deep learning-based innovations, including enhancements in dataset construction, network architecture, sampling strategies, prior knowledge integration, and loss functions. In this paper, we system… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 20 papers, 12 figures

  20. arXiv:2505.14005  [pdf, ps, other

    cs.LG cs.AI

    Towards Comprehensive and Prerequisite-Free Explainer for Graph Neural Networks

    Authors: Han Zhang, Yan Wang, Guanfeng Liu, Pengfei Ding, Huaxiong Wang, Kwok-Yan Lam

    Abstract: To enhance the reliability and credibility of graph neural networks (GNNs) and improve the transparency of their decision logic, a new field of explainability of GNNs (XGNN) has emerged. However, two major limitations severely degrade the performance and hinder the generalizability of existing XGNN methods: they (a) fail to capture the complete decision logic of GNNs across diverse distributions i… ▽ More

    Submitted 23 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

    Comments: Accepted by IJCAI 2025 AI4Tech Track

  21. arXiv:2505.12786  [pdf, ps, other

    cs.NI

    Forewarned is Forearmed: A Survey on Large Language Model-based Agents in Autonomous Cyberattacks

    Authors: Minrui Xu, Jiani Fan, Xinyu Huang, Conghao Zhou, Jiawen Kang, Dusit Niyato, Shiwen Mao, Zhu Han, Xuemin, Shen, Kwok-Yan Lam

    Abstract: With the continuous evolution of Large Language Models (LLMs), LLM-based agents have advanced beyond passive chatbots to become autonomous cyber entities capable of performing complex tasks, including web browsing, malicious code and deceptive content generation, and decision-making. By significantly reducing the time, expertise, and resources, AI-assisted cyberattacks orchestrated by LLM-based ag… ▽ More

    Submitted 27 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

  22. arXiv:2505.06679  [pdf, ps, other

    cs.CV

    T2V-OptJail: Discrete Prompt Optimization for Text-to-Video Jailbreak Attacks

    Authors: Jiayang Liu, Siyuan Liang, Shiqian Zhao, Rongcheng Tu, Wenbo Zhou, Aishan Liu, Dacheng Tao, Siew Kei Lam

    Abstract: In recent years, fueled by the rapid advancement of diffusion models, text-to-video (T2V) generation models have achieved remarkable progress, with notable examples including Pika, Luma, Kling, and Open-Sora. Although these models exhibit impressive generative capabilities, they also expose significant security risks due to their vulnerability to jailbreak attacks, where the models are manipulated… ▽ More

    Submitted 17 June, 2025; v1 submitted 10 May, 2025; originally announced May 2025.

  23. arXiv:2504.20118  [pdf, ps, other

    cs.IR cs.AI

    OpenTCM: A GraphRAG-Empowered LLM-based System for Traditional Chinese Medicine Knowledge Retrieval and Diagnosis

    Authors: Jinglin He, Yunqi Guo, Lai Kwan Lam, Waikei Leung, Lixing He, Yuanan Jiang, Chi Chiu Wang, Guoliang Xing, Hongkai Chen

    Abstract: Traditional Chinese Medicine (TCM) represents a rich repository of ancient medical knowledge that continues to play an important role in modern healthcare. Due to the complexity and breadth of the TCM literature, the integration of AI technologies is critical for its modernization and broader accessibility. However, this integration poses considerable challenges, including the interpretation of ob… ▽ More

    Submitted 27 June, 2025; v1 submitted 28 April, 2025; originally announced April 2025.

    Comments: 8 pages, 5 figures, 7 tables

  24. arXiv:2504.07099  [pdf, ps, other

    cs.CE

    Time Series Analysis in Frequency Domain: A Survey of Open Challenges, Opportunities and Benchmarks

    Authors: Qianru Zhang, Yuting Sun, Honggang Wen, Peng Yang, Xinzhu Li, Ming Li, Kwok-Yan Lam, Siu-Ming Yiu, Hongzhi Yin

    Abstract: Frequency-domain analysis has emerged as a powerful paradigm for time series analysis, offering unique advantages over traditional time-domain approaches while introducing new theoretical and practical challenges. This survey provides a comprehensive examination of spectral methods from classical Fourier analysis to modern neural operators, systematically summarizing three open challenges in curre… ▽ More

    Submitted 23 September, 2025; v1 submitted 11 February, 2025; originally announced April 2025.

    Comments: 35 pages

  25. arXiv:2504.03759  [pdf, other

    cs.CR cs.AI

    Emerging Cyber Attack Risks of Medical AI Agents

    Authors: Jianing Qiu, Lin Li, Jiankai Sun, Hao Wei, Zhe Xu, Kyle Lam, Wu Yuan

    Abstract: Large language models (LLMs)-powered AI agents exhibit a high level of autonomy in addressing medical and healthcare challenges. With the ability to access various tools, they can operate within an open-ended action space. However, with the increase in autonomy and ability, unforeseen risks also arise. In this work, we investigated one particular risk, i.e., cyber attack vulnerability of medical A… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  26. arXiv:2504.01308  [pdf, ps, other

    cs.CV

    Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks

    Authors: Jiawei Wang, Yushen Zuo, Yuanjun Chai, Zhendong Liu, Yicheng Fu, Yichun Feng, Kin-Man Lam

    Abstract: Vision-Language Models (VLMs) extend the capabilities of Large Language Models (LLMs) by incorporating visual information, yet they remain vulnerable to jailbreak attacks, especially when processing noisy or corrupted images. Although existing VLMs adopt security measures during training to mitigate such attacks, vulnerabilities associated with noise-augmented visual inputs are overlooked. In this… ▽ More

    Submitted 2 August, 2025; v1 submitted 1 April, 2025; originally announced April 2025.

    Comments: ICCV 2025

  27. arXiv:2503.14260  [pdf, other

    physics.optics cs.LG

    Automating Experimental Optics with Sample Efficient Machine Learning Methods

    Authors: Arindam Saha, Baramee Charoensombutamon, Thibault Michel, V. Vijendran, Lachlan Walker, Akira Furusawa, Syed M. Assad, Ben C. Buchler, Ping Koy Lam, Aaron D. Tranter

    Abstract: As free-space optical systems grow in scale and complexity, troubleshooting becomes increasingly time-consuming and, in the case of remote installations, perhaps impractical. An example of a task that is often laborious is the alignment of a high-finesse optical resonator, which is highly sensitive to the mode of the input beam. In this work, we demonstrate how machine learning can be used to achi… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

  28. arXiv:2503.10620  [pdf, other

    cs.CL

    From TOWER to SPIRE: Adding the Speech Modality to a Text-Only LLM

    Authors: Kshitij Ambilduke, Ben Peters, Sonal Sannigrahi, Anil Keshwani, Tsz Kin Lam, Bruno Martins, Marcely Zanon Boito, André F. T. Martins

    Abstract: Large language models (LLMs) have shown remarkable performance and generalization capabilities across multiple languages and tasks, making them very attractive targets for multi-modality integration (e.g., images or speech). In this work, we extend an existing LLM to the speech modality via speech discretization and continued pre-training. In particular, we are interested in multilingual LLMs, suc… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  29. arXiv:2503.10058  [pdf, other

    cs.LG cs.AI cs.CR

    Deep Learning Approaches for Anti-Money Laundering on Mobile Transactions: Review, Framework, and Directions

    Authors: Jiani Fan, Lwin Khin Shar, Ruichen Zhang, Ziyao Liu, Wenzhuo Yang, Dusit Niyato, Bomin Mao, Kwok-Yan Lam

    Abstract: Money laundering is a financial crime that obscures the origin of illicit funds, necessitating the development and enforcement of anti-money laundering (AML) policies by governments and organizations. The proliferation of mobile payment platforms and smart IoT devices has significantly complicated AML investigations. As payment networks become more interconnected, there is an increasing need for e… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  30. arXiv:2503.07048  [pdf, other

    cs.CR

    A Failure-Free and Efficient Discrete Laplace Distribution for Differential Privacy in MPC

    Authors: Ivan Tjuawinata, Jiabo Wang, Mengmeng Yang, Shanxiang Lyu, Huaxiong Wang, Kwok-Yan Lam

    Abstract: In an MPC-protected distributed computation, although the use of MPC assures data privacy during computation, sensitive information may still be inferred by curious MPC participants from the computation output. This can be observed, for instance, in the inference attacks on either federated learning or a more standard statistical computation with distributed inputs. In this work, we address this o… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  31. arXiv:2502.14403  [pdf, other

    cs.SI cs.CL cs.LG

    A Macro- and Micro-Hierarchical Transfer Learning Framework for Cross-Domain Fake News Detection

    Authors: Xuankai Yang, Yan Wang, Xiuzhen Zhang, Shoujin Wang, Huaxiong Wang, Kwok Yan Lam

    Abstract: Cross-domain fake news detection aims to mitigate domain shift and improve detection performance by transferring knowledge across domains. Existing approaches transfer knowledge based on news content and user engagements from a source domain to a target domain. However, these approaches face two main limitations, hindering effective knowledge transfer and optimal fake news detection performance. F… ▽ More

    Submitted 24 February, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

    Comments: 11 pages, 8 figures, to be published in The 2025 ACM Web Conference (WWW '25)

  32. arXiv:2502.04394  [pdf, other

    cs.CL cs.AI

    DECT: Harnessing LLM-assisted Fine-Grained Linguistic Knowledge and Label-Switched and Label-Preserved Data Generation for Diagnosis of Alzheimer's Disease

    Authors: Tingyu Mo, Jacqueline C. K. Lam, Victor O. K. Li, Lawrence Y. L. Cheung

    Abstract: Alzheimer's Disease (AD) is an irreversible neurodegenerative disease affecting 50 million people worldwide. Low-cost, accurate identification of key markers of AD is crucial for timely diagnosis and intervention. Language impairment is one of the earliest signs of cognitive decline, which can be used to discriminate AD patients from normal control individuals. Patient-interviewer dialogues may be… ▽ More

    Submitted 26 May, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

  33. arXiv:2502.03501  [pdf, other

    eess.IV cs.LG

    Proxy Prompt: Endowing SAM and SAM 2 with Auto-Interactive-Prompt for Medical Segmentation

    Authors: Wang Xinyi, Kang Hongyu, Wei Peishan, Shuai Li, Yu Sun, Sai Kit Lam, Yongping Zheng

    Abstract: In this paper, we aim to address the unmet demand for automated prompting and enhanced human-model interactions of SAM and SAM2 for the sake of promoting their widespread clinical adoption. Specifically, we propose Proxy Prompt (PP), auto-generated by leveraging non-target data with a pre-annotated mask. We devise a novel 3-step context-selection strategy for adaptively selecting the most represen… ▽ More

    Submitted 8 May, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

  34. arXiv:2501.16419  [pdf, other

    quant-ph cs.DS cs.ET math.OC

    Near-Optimal Parameter Tuning of Level-1 QAOA for Ising Models

    Authors: V Vijendran, Dax Enshan Koh, Eunok Bae, Hyukjoon Kwon, Ping Koy Lam, Syed M Assad

    Abstract: The Quantum Approximate Optimisation Algorithm (QAOA) is a hybrid quantum-classical algorithm for solving combinatorial optimisation problems. QAOA encodes solutions into the ground state of a Hamiltonian, approximated by a $p$-level parameterised quantum circuit composed of problem and mixer Hamiltonians, with parameters optimised classically. While deeper QAOA circuits can offer greater accuracy… ▽ More

    Submitted 15 May, 2025; v1 submitted 27 January, 2025; originally announced January 2025.

    Comments: 54 pages, 7 Figures, Made Minor Changes

  35. SMART-Vision: Survey of Modern Action Recognition Techniques in Vision

    Authors: Ali K. AlShami, Ryan Rabinowitz, Khang Lam, Yousra Shleibik, Melkamu Mersha, Terrance Boult, Jugal Kalita

    Abstract: Human Action Recognition (HAR) is a challenging domain in computer vision, involving recognizing complex patterns by analyzing the spatiotemporal dynamics of individuals' movements in videos. These patterns arise in sequential data, such as video frames, which are often essential to accurately distinguish actions that would be ambiguous in a single image. HAR has garnered considerable interest due… ▽ More

    Submitted 22 January, 2025; originally announced January 2025.

    Journal ref: Multimedia Tools and Applications, Springer, 2024, pp. 1-72

  36. arXiv:2501.11508  [pdf, other

    cs.CV

    See In Detail: Enhancing Sparse-view 3D Gaussian Splatting with Local Depth and Semantic Regularization

    Authors: Zongqi He, Zhe Xiao, Kin-Chung Chan, Yushen Zuo, Jun Xiao, Kin-Man Lam

    Abstract: 3D Gaussian Splatting (3DGS) has shown remarkable performance in novel view synthesis. However, its rendering quality deteriorates with sparse inphut views, leading to distorted content and reduced details. This limitation hinders its practical application. To address this issue, we propose a sparse-view 3DGS method. Given the inherently ill-posed nature of sparse-view rendering, incorporating pri… ▽ More

    Submitted 20 January, 2025; originally announced January 2025.

    Comments: 5 pages, 5 figures, has been accepted by the ICASSP 2025

  37. arXiv:2501.06701  [pdf, ps, other

    q-fin.MF cs.IT cs.LG math.PR q-fin.PM

    Sequential Portfolio Selection under Latent Side Information-Dependence Structure: Optimality and Universal Learning Algorithms

    Authors: Duy Khanh Lam

    Abstract: This paper investigates the investment problem of constructing an optimal no-short sequential portfolio strategy in a market with a latent dependence structure between asset prices and partly unobservable side information, which is often high-dimensional. The results demonstrate that a dynamic strategy, which forms a portfolio based on perfect knowledge of the dependence structure and full market… ▽ More

    Submitted 19 January, 2025; v1 submitted 11 January, 2025; originally announced January 2025.

    Comments: 34 pages, working paper, second draft (with the remark in section 3.2 removed from the first draft)

  38. arXiv:2501.04952  [pdf, other

    cs.LG cs.AI cs.CY

    Open Problems in Machine Unlearning for AI Safety

    Authors: Fazl Barez, Tingchen Fu, Ameya Prabhu, Stephen Casper, Amartya Sanyal, Adel Bibi, Aidan O'Gara, Robert Kirk, Ben Bucknall, Tim Fist, Luke Ong, Philip Torr, Kwok-Yan Lam, Robert Trager, David Krueger, Sören Mindermann, José Hernandez-Orallo, Mor Geva, Yarin Gal

    Abstract: As AI systems become more capable, widely deployed, and increasingly autonomous in critical areas such as cybersecurity, biological research, and healthcare, ensuring their safety and alignment with human values is paramount. Machine unlearning -- the ability to selectively forget or suppress specific types of knowledge -- has shown promise for privacy and data removal tasks, which has been the pr… ▽ More

    Submitted 8 January, 2025; originally announced January 2025.

  39. arXiv:2501.02370  [pdf, other

    cs.CL cs.SD eess.AS

    Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison

    Authors: Tsz Kin Lam, Marco Gaido, Sara Papi, Luisa Bentivogli, Barry Haddow

    Abstract: Following the remarkable success of Large Language Models (LLMs) in NLP tasks, there is increasing interest in extending their capabilities to speech -- the most common form of communication. The most widespread approach to integrating speech into LLMs is dense feature prepending (DFP), which prepends the projected speech representations to the textual representations, allowing end-to-end training… ▽ More

    Submitted 7 February, 2025; v1 submitted 4 January, 2025; originally announced January 2025.

    Comments: Accepted at NAACL 2025

  40. Artificial Intelligence without Restriction Surpassing Human Intelligence with Probability One: Theoretical Insight into Secrets of the Brain with AI Twins of the Brain

    Authors: Guang-Bin Huang, M. Brandon Westover, Eng-King Tan, Haibo Wang, Dongshun Cui, Wei-Ying Ma, Tiantong Wang, Qi He, Haikun Wei, Ning Wang, Qiyuan Tian, Kwok-Yan Lam, Xin Yao, Tien Yin Wong

    Abstract: Artificial Intelligence (AI) has apparently become one of the most important techniques discovered by humans in history while the human brain is widely recognized as one of the most complex systems in the universe. One fundamental critical question which would affect human sustainability remains open: Will artificial intelligence (AI) evolve to surpass human intelligence in the future? This paper… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

    Comments: Accepted by journal Neurocomputing

  41. arXiv:2412.05462  [pdf, other

    cs.CV

    COOOL: Challenge Of Out-Of-Label A Novel Benchmark for Autonomous Driving

    Authors: Ali K. AlShami, Ananya Kalita, Ryan Rabinowitz, Khang Lam, Rishabh Bezbarua, Terrance Boult, Jugal Kalita

    Abstract: As the Computer Vision community rapidly develops and advances algorithms for autonomous driving systems, the goal of safer and more efficient autonomous transportation is becoming increasingly achievable. However, it is 2024, and we still do not have fully self-driving cars. One of the remaining core challenges lies in addressing the novelty problem, where self-driving systems still struggle to h… ▽ More

    Submitted 6 December, 2024; originally announced December 2024.

  42. arXiv:2412.00085  [pdf

    cs.CV eess.IV

    Residual Attention Single-Head Vision Transformer Network for Rolling Bearing Fault Diagnosis in Noisy Environments

    Authors: Songjiang Lai, Tsun-Hin Cheung, Jiayi Zhao, Kaiwen Xue, Ka-Chun Fung, Kin-Man Lam

    Abstract: Rolling bearings play a crucial role in industrial machinery, directly influencing equipment performance, durability, and safety. However, harsh operating conditions, such as high speeds and temperatures, often lead to bearing malfunctions, resulting in downtime, economic losses, and safety hazards. This paper proposes the Residual Attention Single-Head Vision Transformer Network (RA-SHViT-Net) fo… ▽ More

    Submitted 26 November, 2024; originally announced December 2024.

    Comments: 24 pages, 14 figures, 3 tables

  43. arXiv:2411.19220  [pdf, other

    cs.CV cs.MM

    Automatic Prompt Generation and Grounding Object Detection for Zero-Shot Image Anomaly Detection

    Authors: Tsun-Hin Cheung, Ka-Chun Fung, Songjiang Lai, Kwan-Ho Lin, Vincent Ng, Kin-Man Lam

    Abstract: Identifying defects and anomalies in industrial products is a critical quality control task. Traditional manual inspection methods are slow, subjective, and error-prone. In this work, we propose a novel zero-shot training-free approach for automated industrial image anomaly detection using a multimodal machine learning pipeline, consisting of three foundation models. Our method first uses a large… ▽ More

    Submitted 28 November, 2024; originally announced November 2024.

    Comments: Accepted to APSIPA ASC 2024

  44. arXiv:2411.19093  [pdf

    cs.CV cs.CY cs.LG

    Tracking Progress Towards Sustainable Development Goal 6 Using Satellite Imagery

    Authors: Othmane Echchabi, Aya Lahlou, Nizar Talty, Josh Malcolm Manto, Ka Leung Lam

    Abstract: Clean water and sanitation are essential for health, well-being, and sustainable development, yet significant global disparities persist. Although the United Nations' Sustainable Development Goal (SDG) 6 clearly defines targets for universal access to clean water and sanitation, limitations in data coverage and openness impede accurate tracking of progress in many countries. To bridge these gaps,… ▽ More

    Submitted 29 May, 2025; v1 submitted 28 November, 2024; originally announced November 2024.

  45. arXiv:2411.18280  [pdf, other

    cs.CL

    Neutralizing Backdoors through Information Conflicts for Large Language Models

    Authors: Chen Chen, Yuchen Sun, Xueluan Gong, Jiaxin Gao, Kwok-Yan Lam

    Abstract: Large language models (LLMs) have seen significant advancements, achieving superior performance in various Natural Language Processing (NLP) tasks, from understanding to reasoning. However, they remain vulnerable to backdoor attacks, where models behave normally for standard queries but generate harmful responses or unintended output when specific triggers are activated. Existing backdoor defenses… ▽ More

    Submitted 27 November, 2024; originally announced November 2024.

  46. arXiv:2411.18269  [pdf, other

    cs.CL cs.CR

    Hidden Data Privacy Breaches in Federated Learning

    Authors: Xueluan Gong, Yuji Wang, Shuaike Li, Mengyuan Sun, Songze Li, Qian Wang, Kwok-Yan Lam, Chen Chen

    Abstract: Federated Learning (FL) emerged as a paradigm for conducting machine learning across broad and decentralized datasets, promising enhanced privacy by obviating the need for direct data sharing. However, recent studies show that attackers can steal private data through model manipulation or gradient analysis. Existing attacks are constrained by low theft quantity or low-resolution data, and they are… ▽ More

    Submitted 27 November, 2024; originally announced November 2024.

  47. arXiv:2411.18003  [pdf, other

    eess.IV cs.AI cs.CV

    HAAT: Hybrid Attention Aggregation Transformer for Image Super-Resolution

    Authors: Song-Jiang Lai, Tsun-Hin Cheung, Ka-Chun Fung, Kai-wen Xue, Kin-Man Lam

    Abstract: In the research area of image super-resolution, Swin-transformer-based models are favored for their global spatial modeling and shifting window attention mechanism. However, existing methods often limit self-attention to non overlapping windows to cut costs and ignore the useful information that exists across channels. To address this issue, this paper introduces a novel model, the Hybrid Attentio… ▽ More

    Submitted 10 December, 2024; v1 submitted 26 November, 2024; originally announced November 2024.

    Comments: 6 pages, 2 figures, 1 table

  48. arXiv:2411.18002  [pdf

    cs.CV cs.AI

    An End-to-End Two-Stream Network Based on RGB Flow and Representation Flow for Human Action Recognition

    Authors: Song-Jiang Lai, Tsun-Hin Cheung, Ka-Chun Fung, Tian-Shan Liu, Kin-Man Lam

    Abstract: With the rapid advancements in deep learning, computer vision tasks have seen significant improvements, making two-stream neural networks a popular focus for video based action recognition. Traditional models using RGB and optical flow streams achieve strong performance but at a high computational cost. To address this, we introduce a representation flow algorithm to replace the optical flow branc… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

    Comments: 6 pages, 3 figures, 9 tables

  49. arXiv:2411.12450  [pdf, other

    cs.CV eess.IV

    Frequency-Aware Guidance for Blind Image Restoration via Diffusion Models

    Authors: Jun Xiao, Zihang Lyu, Hao Xie, Cong Zhang, Yakun Ju, Changjian Shui, Kin-Man Lam

    Abstract: Blind image restoration remains a significant challenge in low-level vision tasks. Recently, denoising diffusion models have shown remarkable performance in image synthesis. Guided diffusion models, leveraging the potent generative priors of pre-trained models along with a differential guidance loss, have achieved promising results in blind image restoration. However, these models typically consid… ▽ More

    Submitted 19 November, 2024; originally announced November 2024.

    Comments: 17 pages, 6 figures, has been accepted by the ECCV 2024: AIM workshop

  50. arXiv:2411.11044  [pdf, ps, other

    cs.CR cs.LG

    Efficient Federated Unlearning with Adaptive Differential Privacy Preservation

    Authors: Yu Jiang, Xindi Tong, Ziyao Liu, Huanyi Ye, Chee Wei Tan, Kwok-Yan Lam

    Abstract: Federated unlearning (FU) offers a promising solution to effectively address the need to erase the impact of specific clients' data on the global model in federated learning (FL), thereby granting individuals the ``Right to be Forgotten". The most straightforward approach to achieve unlearning is to train the model from scratch, excluding clients who request data removal, but it is resource-intens… ▽ More

    Submitted 17 November, 2024; originally announced November 2024.