Skip to main content

Showing 1–50 of 141 results for author: Kwok, T

.
  1. arXiv:2506.08303  [pdf, ps, other

    cs.HC

    EMG-Driven Stiffness-Modulating Palpation for Telerehabilitation

    Authors: Thomas M. Kwok, Hilary HY Cheng, Wai Tuck Chow

    Abstract: In this work, we introduce HJ-Pal, a lightweight wearable haptic device that leverages EMG-driven honeycomb jamming to render muscle activation as kinesthetic feedback, enabling remote palpation for small muscle assessment in telerehabilitation.

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: Accepted by the Workshop on Human-Robot Contact and Manipulation (HRCM 2025) at RSS Conference 2025

  2. arXiv:2506.08089  [pdf, ps, other

    hep-ph hep-ex

    Time-Dependent Precision Measurement of $B_s^0\rightarrow φμ^+μ^-$ Decay at FCC-$ee$

    Authors: Tsz Hong Kwok, Zachary Polonsky, Valeriia Lukashenko, Jason Aebischer, Ben Kilminster

    Abstract: We study the feasibility of measuring time-dependent $C\!P$ violation in the rare flavor-changing neutral current (FCNC) decay $B_s^0 \rightarrow φ(\rightarrow K^+K^-) μ^+ μ^-$ at the FCC-$ee$. In the Standard Model (SM), $C\!P$ violation in this mode arises only at higher orders and is highly suppressed. Extensions of the SM, collectively referred to as New Physics (NP), can introduce additional… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: 38 pages, 20 figures

    Report number: CERN-EP-DRAFT-MISC-2025-007, CERN-TH-2025-095

  3. arXiv:2506.05936  [pdf, other

    cs.CL cs.AI

    DynamicMind: A Tri-Mode Thinking System for Large Language Models

    Authors: Wei Li, Yanbin Wei, Qiushi Huang, Jiangyue Yan, Yang Chen, James T. Kwok, Yu Zhang

    Abstract: Modern large language models (LLMs) often struggle to dynamically adapt their reasoning depth to varying task complexities, leading to suboptimal performance or inefficient resource utilization. To address this, we introduce DynamicMind, a novel tri-mode thinking system. DynamicMind empowers LLMs to autonomously select between Fast, Normal, and Slow thinking modes for zero-shot question answering… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  4. arXiv:2506.04559  [pdf, other

    cs.CV

    Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning

    Authors: Yunhao Gou, Kai Chen, Zhili Liu, Lanqing Hong, Xin Jin, Zhenguo Li, James T. Kwok, Yu Zhang

    Abstract: Recent advances in slow-thinking language models (e.g., OpenAI-o1 and DeepSeek-R1) have demonstrated remarkable abilities in complex reasoning tasks by emulating human-like reflective cognition. However, extending such capabilities to multi-modal large language models (MLLMs) remains challenging due to the high cost of retraining vision-language alignments when upgrading the underlying reasoner LL… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  5. arXiv:2505.11781  [pdf, other

    cs.LG

    Multi-Order Wavelet Derivative Transform for Deep Time Series Forecasting

    Authors: Ziyu Zhou, Jiaxi Hu, Qingsong Wen, James T. Kwok, Yuxuan Liang

    Abstract: In deep time series forecasting, the Fourier Transform (FT) is extensively employed for frequency representation learning. However, it often struggles in capturing multi-scale, time-sensitive patterns. Although the Wavelet Transform (WT) can capture these patterns through frequency decomposition, its coefficients are insensitive to change points in time series, leading to suboptimal modeling. To m… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: Preprint. Work in progress

  6. arXiv:2505.09047  [pdf, ps, other

    cs.HC

    Positioning Monocular Optical See Through Head Worn Displays in Glasses for Everyday Wear

    Authors: Parth Arora, Ethan Kimmel, Katherine Huang, Tyler Kwok, Yukun Song, Sofia Vempala, Georgianna Lin, Ozan Cakmakci, Thad Starner

    Abstract: Head-worn displays for everyday wear in the form of regular eyeglasses are technically feasible with recent advances in waveguide technology. One major design decision is determining where in the user's visual field to position the display. Centering the display in the principal point of gaze (PPOG) allows the user to switch attentional focus between the virtual and real images quickly, and best p… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  7. arXiv:2505.00274  [pdf

    physics.acc-ph hep-ex hep-ph

    Future Circular Collider Feasibility Study Report: Volume 2, Accelerators, Technical Infrastructure and Safety

    Authors: M. Benedikt, F. Zimmermann, B. Auchmann, W. Bartmann, J. P. Burnet, C. Carli, A. Chancé, P. Craievich, M. Giovannozzi, C. Grojean, J. Gutleber, K. Hanke, A. Henriques, P. Janot, C. Lourenço, M. Mangano, T. Otto, J. Poole, S. Rajagopalan, T. Raubenheimer, E. Todesco, L. Ulrici, T. Watson, G. Wilkinson, A. Abada , et al. (1439 additional authors not shown)

    Abstract: In response to the 2020 Update of the European Strategy for Particle Physics, the Future Circular Collider (FCC) Feasibility Study was launched as an international collaboration hosted by CERN. This report describes the FCC integrated programme, which consists of two stages: an electron-positron collider (FCC-ee) in the first phase, serving as a high-luminosity Higgs, top, and electroweak factory;… ▽ More

    Submitted 25 April, 2025; originally announced May 2025.

    Comments: 627 pages. Please address any comment or request to [email protected]

    Report number: CERN-FCC-ACC-2025-0004

  8. arXiv:2505.00273  [pdf, other

    physics.acc-ph hep-ex hep-ph

    Future Circular Collider Feasibility Study Report: Volume 3, Civil Engineering, Implementation and Sustainability

    Authors: M. Benedikt, F. Zimmermann, B. Auchmann, W. Bartmann, J. P. Burnet, C. Carli, A. Chancé, P. Craievich, M. Giovannozzi, C. Grojean, J. Gutleber, K. Hanke, A. Henriques, P. Janot, C. Lourenço, M. Mangano, T. Otto, J. Poole, S. Rajagopalan, T. Raubenheimer, E. Todesco, L. Ulrici, T. Watson, G. Wilkinson, P. Azzi , et al. (1439 additional authors not shown)

    Abstract: Volume 3 of the FCC Feasibility Report presents studies related to civil engineering, the development of a project implementation scenario, and environmental and sustainability aspects. The report details the iterative improvements made to the civil engineering concepts since 2018, taking into account subsurface conditions, accelerator and experiment requirements, and territorial considerations. I… ▽ More

    Submitted 25 April, 2025; originally announced May 2025.

    Comments: 357 pages. Please address any comment or request to [email protected]

    Report number: CERN-FCC-ACC-2025-0003

  9. arXiv:2505.00272  [pdf, other

    hep-ex hep-ph physics.acc-ph

    Future Circular Collider Feasibility Study Report: Volume 1, Physics, Experiments, Detectors

    Authors: M. Benedikt, F. Zimmermann, B. Auchmann, W. Bartmann, J. P. Burnet, C. Carli, A. Chancé, P. Craievich, M. Giovannozzi, C. Grojean, J. Gutleber, K. Hanke, A. Henriques, P. Janot, C. Lourenço, M. Mangano, T. Otto, J. Poole, S. Rajagopalan, T. Raubenheimer, E. Todesco, L. Ulrici, T. Watson, G. Wilkinson, P. Azzi , et al. (1439 additional authors not shown)

    Abstract: Volume 1 of the FCC Feasibility Report presents an overview of the physics case, experimental programme, and detector concepts for the Future Circular Collider (FCC). This volume outlines how FCC would address some of the most profound open questions in particle physics, from precision studies of the Higgs and EW bosons and of the top quark, to the exploration of physics beyond the Standard Model.… ▽ More

    Submitted 25 April, 2025; originally announced May 2025.

    Comments: 290 pages. Please address any comment or request to [email protected]

    Report number: CERN-FCC-PHYS-2025-0002

  10. arXiv:2504.14800  [pdf, other

    cs.LG cs.CV

    A Survey on Small Sample Imbalance Problem: Metrics, Feature Analysis, and Solutions

    Authors: Shuxian Zhao, Jie Gui, Minjing Dong, Baosheng Yu, Zhipeng Gui, Lu Dong, Yuan Yan Tang, James Tin-Yau Kwok

    Abstract: The small sample imbalance (S&I) problem is a major challenge in machine learning and data analysis. It is characterized by a small number of samples and an imbalanced class distribution, which leads to poor model performance. In addition, indistinct inter-class feature distributions further complicate classification tasks. Existing methods often rely on algorithmic heuristics without sufficiently… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  11. arXiv:2504.14493  [pdf, ps, other

    cs.IR cs.AI cs.LG

    FinSage: A Multi-aspect RAG System for Financial Filings Question Answering

    Authors: Xinyu Wang, Jijun Chi, Zhenghan Tai, Tung Sum Thomas Kwok, Muzhi Li, Zhuhong Li, Hailin He, Yuchen Hua, Peng Lu, Suyuchen Wang, Yihong Wu, Jerry Huang, Jingrui Tian, Fengran Mo, Yufei Cui, Ling Zhou

    Abstract: Leveraging large language models in real-world settings often entails a need to utilize domain-specific data and tools in order to follow the complex regulations that need to be followed for acceptable use. Within financial sectors, modern enterprises increasingly rely on Retrieval-Augmented Generation (RAG) systems to address complex compliance requirements in financial document workflows. Howeve… ▽ More

    Submitted 6 June, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

  12. arXiv:2504.14253  [pdf, other

    cs.CV

    ColorVein: Colorful Cancelable Vein Biometrics

    Authors: Yifan Wang, Jie Gui, Xinli Shi, Linqing Gui, Yuan Yan Tang, James Tin-Yau Kwok

    Abstract: Vein recognition technologies have become one of the primary solutions for high-security identification systems. However, the issue of biometric information leakage can still pose a serious threat to user privacy and anonymity. Currently, there is no cancelable biometric template generation scheme specifically designed for vein biometrics. Therefore, this paper proposes an innovative cancelable ve… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

  13. arXiv:2504.07001  [pdf, other

    cs.RO cs.HC

    Leveraging GCN-based Action Recognition for Teleoperation in Daily Activity Assistance

    Authors: Thomas M. Kwok, Jiaan Li, Yue Hu

    Abstract: Caregiving of older adults is an urgent global challenge, with many older adults preferring to age in place rather than enter residential care. However, providing adequate home-based assistance remains difficult, particularly in geographically vast regions. Teleoperated robots offer a promising solution, but conventional motion-mapping teleoperation imposes unnatural movement constraints on operat… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  14. arXiv:2503.15564  [pdf, other

    cs.LG

    GReaTER: Generate Realistic Tabular data after data Enhancement and Reduction

    Authors: Tung Sum Thomas Kwok, Chi-Hua Wang, Guang Cheng

    Abstract: Tabular data synthesis involves not only multi-table synthesis but also generating multi-modal data (e.g., strings and categories), which enables diverse knowledge synthesis. However, separating numerical and categorical data has limited the effectiveness of tabular data generation. The GReaT (Generate Realistic Tabular Data) framework uses Large Language Models (LLMs) to encode entire rows, elimi… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

    Comments: Accepted by Data Engineering Meets Large Language Models: Challenges and Opportunities Workshop@ICDE2025 Workshop at ICDE 2025

  15. arXiv:2503.15518  [pdf, other

    cs.HC

    Robot Character Generation and Adaptive Human-Robot Interaction with Personality Shaping

    Authors: Cheng Tang, Chao Tang, Steven Gong, Thomas M. Kwok, Yue Hu

    Abstract: We present a novel framework for designing emotionally agile robots with dynamic personalities and memory-based learning, with the aim of performing adaptive and non-deterministic interactions with humans while conforming to shared social understanding. While existing work has largely focused on emotion recognition and static response systems, many approaches rely on sentiment analysis and action… ▽ More

    Submitted 21 March, 2025; v1 submitted 2 February, 2025; originally announced March 2025.

  16. arXiv:2503.00293  [pdf

    cs.RO

    A Practical Sensing Interface for Exoskeleton Evaluation in Workplaces using Interface Forces

    Authors: Joshua Leong Wei Ren, Thomas M. Kwok

    Abstract: This paper presents a novel approach to evaluating back support exoskeletons (BSEs) in workplace settings addressing the limitations of traditional methods like electromyography (EMG), which are impractical due to their sensitivity to external disturbances and user sweat. Variability in BSE performance among users, often due to joint misalignment and anthropomorphic differences, can lead to discom… ▽ More

    Submitted 28 February, 2025; originally announced March 2025.

    Comments: 6 pages, 5 figures, presented at IEEE International Conference on Robotics and Biomimetics (ROBIO) 10-14 Dec 2024

  17. arXiv:2502.12635  [pdf, other

    cs.CV

    Corrupted but Not Broken: Understanding and Mitigating the Negative Impacts of Corrupted Data in Visual Instruction Tuning

    Authors: Yunhao Gou, Hansi Yang, Zhili Liu, Kai Chen, Yihan Zeng, Lanqing Hong, Zhenguo Li, Qun Liu, Bo Han, James T. Kwok, Yu Zhang

    Abstract: Visual Instruction Tuning (VIT) aims to enhance Multimodal Large Language Models (MLLMs), yet its effectiveness is often compromised by corrupted datasets with issues such as hallucinated content, incorrect responses, and poor OCR quality. Previous approaches to address these challenges have focused on refining datasets through high-quality data collection or rule-based filtering that can be costl… ▽ More

    Submitted 27 May, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

  18. arXiv:2501.10945  [pdf, other

    cs.LG stat.ML

    Gradient-Based Multi-Objective Deep Learning: Algorithms, Theories, Applications, and Beyond

    Authors: Weiyu Chen, Xiaoyuan Zhang, Baijiong Lin, Xi Lin, Han Zhao, Qingfu Zhang, James T. Kwok

    Abstract: Multi-objective optimization (MOO) in deep learning aims to simultaneously optimize multiple conflicting objectives, a challenge frequently encountered in areas like multi-task learning and multi-criteria learning. Recent advancements in gradient-based MOO methods have enabled the discovery of diverse types of solutions, ranging from a single balanced solution to finite or even infinite Pareto set… ▽ More

    Submitted 3 March, 2025; v1 submitted 18 January, 2025; originally announced January 2025.

  19. arXiv:2501.03727  [pdf, other

    eess.AS cs.LG

    Detecting Neurocognitive Disorders through Analyses of Topic Evolution and Cross-modal Consistency in Visual-Stimulated Narratives

    Authors: Jinchao Li, Yuejiao Wang, Junan Li, Jiawen Kang, Bo Zheng, Simon Wong, Brian Mak, Helene Fung, Jean Woo, Man-Wai Mak, Timothy Kwok, Vincent Mok, Xianmin Gong, Xixin Wu, Xunying Liu, Patrick Wong, Helen Meng

    Abstract: Early detection of neurocognitive disorders (NCDs) is crucial for timely intervention and disease management. Speech analysis offers a non-intrusive and scalable screening method, particularly through narrative tasks in neuropsychological assessment tools. Traditional narrative analysis often focuses on local indicators in microstructure, such as word usage and syntax. While these features provide… ▽ More

    Submitted 7 January, 2025; originally announced January 2025.

    Comments: 12 pages, 8 figures

  20. arXiv:2501.02814  [pdf

    physics.ao-ph cs.LG

    Analogue Forecast System for Daily Precipitation Prediction Using Autoencoder Feature Extraction: Application in Hong Kong

    Authors: Yee Chun Tsoi, Yu Ting Kwok, Ming Chun Lam, Wai Kin Wong

    Abstract: In the Hong Kong Observatory, the Analogue Forecast System (AFS) for precipitation has been providing useful reference in predicting possible daily rainfall scenarios for the next 9 days, by identifying historical cases with similar weather patterns to the latest output from the deterministic model of the European Centre for Medium-Range Weather Forecasts (ECMWF). Recent advances in machine learni… ▽ More

    Submitted 6 January, 2025; originally announced January 2025.

    Comments: 16 pages, 10 figures

    Journal ref: Hong Kong Meteorological Society E-BULLETIN Vol. 28, 2 (2024)

  21. arXiv:2412.19743  [pdf, other

    hep-ex hep-ph

    Flavor Physics at CEPC: a General Perspective

    Authors: Xiaocong Ai, Wolfgang Altmannshofer, Peter Athron, Xiaozhi Bai, Lorenzo Calibbi, Lu Cao, Yuzhi Che, Chunhui Chen, Ji-Yuan Chen, Long Chen, Mingshui Chen, Shanzhen Chen, Xuan Chen, Shan Cheng, Cheng-Wei Chiang, Andreas Crivellin, Hanhua Cui, Olivier Deschamps, Sébastien Descotes-Genon, Xiaokang Du, Shuangshi Fang, Yu Gao, Li-Sheng Geng, Pablo Goldenzweig, Jiayin Gu , et al. (116 additional authors not shown)

    Abstract: We discuss the landscape of flavor physics at the Circular Electron-Positron Collider (CEPC), based on the nominal luminosity outlined in its Technical Design Report. The CEPC is designed to operate in multiple modes to address a variety of tasks. At the $Z$ pole, the expected production of 4 Tera $Z$ bosons will provide unique and highly precise measurements of $Z$ boson couplings, while the subs… ▽ More

    Submitted 31 December, 2024; v1 submitted 27 December, 2024; originally announced December 2024.

  22. arXiv:2411.00879  [pdf, other

    cs.DB cs.LG

    DEREC-SIMPRO: unlock Language Model benefits to advance Synthesis in Data Clean Room

    Authors: Tung Sum Thomas Kwok, Chi-hua Wang, Guang Cheng

    Abstract: Data collaboration via Data Clean Room offers value but raises privacy concerns, which can be addressed through synthetic data and multi-table synthesizers. Common multi-table synthesizers fail to perform when subjects occur repeatedly in both tables. This is an urgent yet unresolved problem, since having both tables with repeating subjects is common. To improve performance in this scenario, we pr… ▽ More

    Submitted 31 October, 2024; originally announced November 2024.

  23. arXiv:2409.19886  [pdf, other

    cs.LG cs.AI cs.CL

    RouterDC: Query-Based Router by Dual Contrastive Learning for Assembling Large Language Models

    Authors: Shuhao Chen, Weisen Jiang, Baijiong Lin, James T. Kwok, Yu Zhang

    Abstract: Recent works show that assembling multiple off-the-shelf large language models (LLMs) can harness their complementary abilities. To achieve this, routing is a promising method, which learns a router to select the most suitable LLM for each query. However, existing routing models are ineffective when multiple LLMs perform well for a query. To address this problem, in this paper, we propose a method… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

    Comments: Accepted by NeurIPS 2024

  24. arXiv:2409.19685  [pdf, other

    cs.CV

    Underwater Organism Color Enhancement via Color Code Decomposition, Adaptation and Interpolation

    Authors: Xiaofeng Cong, Jing Zhang, Yeying Jin, Junming Hou, Yu Zhao, Jie Gui, James Tin-Yau Kwok, Yuan Yan Tang

    Abstract: Underwater images often suffer from quality degradation due to absorption and scattering effects. Most existing underwater image enhancement algorithms produce a single, fixed-color image, limiting user flexibility and application. To address this limitation, we propose a method called \textit{ColorCode}, which enhances underwater images while offering a range of controllable color outputs. Our ap… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

  25. arXiv:2409.18042  [pdf, other

    cs.CV cs.CL

    EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

    Authors: Kai Chen, Yunhao Gou, Runhui Huang, Zhili Liu, Daxin Tan, Jing Xu, Chunwei Wang, Yi Zhu, Yihan Zeng, Kuo Yang, Dingdong Wang, Kun Xiang, Haoyuan Li, Haoli Bai, Jianhua Han, Xiaohui Li, Weike Jin, Nian Xie, Yu Zhang, James T. Kwok, Hengshuang Zhao, Xiaodan Liang, Dit-Yan Yeung, Xiao Chen, Zhenguo Li , et al. (6 additional authors not shown)

    Abstract: GPT-4o, an omni-modal model that enables vocal conversations with diverse emotions and tones, marks a milestone for omni-modal foundation models. However, empowering Large Language Models to perceive and generate images, texts, and speeches end-to-end with publicly available data remains challenging for the open-source community. Existing vision-language models rely on external tools for speech pr… ▽ More

    Submitted 20 March, 2025; v1 submitted 26 September, 2024; originally announced September 2024.

    Comments: Accepted by CVPR 2025. Project Page: https://emova-ollm.github.io/

  26. arXiv:2409.17589  [pdf, other

    cs.CV cs.AI

    Improving Fast Adversarial Training via Self-Knowledge Guidance

    Authors: Chengze Jiang, Junkai Wang, Minjing Dong, Jie Gui, Xinli Shi, Yuan Cao, Yuan Yan Tang, James Tin-Yau Kwok

    Abstract: Adversarial training has achieved remarkable advancements in defending against adversarial attacks. Among them, fast adversarial training (FAT) is gaining attention for its ability to achieve competitive robustness with fewer computing resources. Existing FAT methods typically employ a uniform strategy that optimizes all training data equally without considering the influence of different examples… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

    Comments: 13 pages

  27. CFVNet: An End-to-End Cancelable Finger Vein Network for Recognition

    Authors: Yifan Wang, Jie Gui, Yuan Yan Tang, James Tin-Yau Kwok

    Abstract: Finger vein recognition technology has become one of the primary solutions for high-security identification systems. However, it still has information leakage problems, which seriously jeopardizes users privacy and anonymity and cause great security risks. In addition, there is no work to consider a fully integrated secure finger vein recognition system. So, different from the previous systems, we… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

    Journal ref: in IEEE Transactions on Information Forensics and Security, vol. 19, pp. 7810-7823, 2024

  28. arXiv:2409.06420  [pdf, other

    eess.IV cs.CV

    Unrevealed Threats: A Comprehensive Study of the Adversarial Robustness of Underwater Image Enhancement Models

    Authors: Siyu Zhai, Zhibo He, Xiaofeng Cong, Junming Hou, Jie Gui, Jian Wei You, Xin Gong, James Tin-Yau Kwok, Yuan Yan Tang

    Abstract: Learning-based methods for underwater image enhancement (UWIE) have undergone extensive exploration. However, learning-based models are usually vulnerable to adversarial examples so as the UWIE models. To the best of our knowledge, there is no comprehensive study on the adversarial robustness of UWIE models, which indicates that UWIE models are potentially under the threat of adversarial attacks.… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  29. arXiv:2408.13126  [pdf, other

    cs.CV

    CathAction: A Benchmark for Endovascular Intervention Understanding

    Authors: Baoru Huang, Tuan Vo, Chayun Kongtongvattana, Giulio Dagnino, Dennis Kundrat, Wenqiang Chi, Mohamed Abdelaziz, Trevor Kwok, Tudor Jianu, Tuong Do, Hieu Le, Minh Nguyen, Hoan Nguyen, Erman Tjiputra, Quang Tran, Jianyang Xie, Yanda Meng, Binod Bhattarai, Zhaorui Tan, Hongbin Liu, Hong Seng Gan, Wei Wang, Xi Yang, Qiufeng Wang, Jionglong Su , et al. (13 additional authors not shown)

    Abstract: Real-time visual feedback from catheterization analysis is crucial for enhancing surgical safety and efficiency during endovascular interventions. However, existing datasets are often limited to specific tasks, small scale, and lack the comprehensive annotations necessary for broader endovascular intervention understanding. To tackle these limitations, we introduce CathAction, a large-scale datase… ▽ More

    Submitted 30 August, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

    Comments: 10 pages. Webpage: https://airvlab.github.io/cathaction/

  30. arXiv:2407.20734  [pdf, other

    cs.LG

    Efficient Pareto Manifold Learning with Low-Rank Structure

    Authors: Weiyu Chen, James T. Kwok

    Abstract: Multi-task learning, which optimizes performance across multiple tasks, is inherently a multi-objective optimization problem. Various algorithms are developed to provide discrete trade-off solutions on the Pareto front. Recently, continuous Pareto front approximations using a linear combination of base networks have emerged as a compelling strategy. However, it suffers from scalability issues when… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: ICML 2024 (Spotlight)

  31. arXiv:2407.03641  [pdf, other

    cs.LG

    Learning Scalable Model Soup on a Single GPU: An Efficient Subspace Training Strategy

    Authors: Tao Li, Weisen Jiang, Fanghui Liu, Xiaolin Huang, James T. Kwok

    Abstract: Pre-training followed by fine-tuning is widely adopted among practitioners. The performance can be improved by "model soups"~\cite{wortsman2022model} via exploring various hyperparameter configurations.The Learned-Soup, a variant of model soups, significantly improves the performance but suffers from substantial memory and time costs due to the requirements of (i) having to load all fine-tuned mod… ▽ More

    Submitted 23 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  32. arXiv:2406.13183  [pdf, other

    cs.LG cs.CR cs.DC

    Communication-Efficient and Privacy-Preserving Decentralized Meta-Learning

    Authors: Hansi Yang, James T. Kwok

    Abstract: Distributed learning, which does not require gathering training data in a central location, has become increasingly important in the big-data era. In particular, random-walk-based decentralized algorithms are flexible in that they do not need a central server trusted by all clients and do not require all clients to be active in all iterations. However, existing distributed learning algorithms assu… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  33. arXiv:2406.01417  [pdf, other

    cs.LG cs.CV

    Mixup Augmentation with Multiple Interpolations

    Authors: Lifeng Shen, Jincheng Yu, Hansi Yang, James T. Kwok

    Abstract: Mixup and its variants form a popular class of data augmentation techniques.Using a random sample pair, it generates a new sample by linear interpolation of the inputs and labels. However, generating only one single interpolation may limit its augmentation ability. In this paper, we propose a simple yet effective extension called multi-mix, which generates multiple interpolations from a sample pai… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  34. arXiv:2405.21040  [pdf, other

    cs.CL cs.AI

    Direct Alignment of Language Models via Quality-Aware Self-Refinement

    Authors: Runsheng Yu, Yong Wang, Xiaoqi Jiao, Youzhi Zhang, James T. Kwok

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has been commonly used to align the behaviors of Large Language Models (LLMs) with human preferences. Recently, a popular alternative is Direct Policy Optimization (DPO), which replaces an LLM-based reward model with the policy itself, thus obviating the need for extra memory and training time to learn the reward model. However, DPO does not consid… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  35. arXiv:2405.00557  [pdf, ps, other

    cs.CL cs.AI

    Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment

    Authors: Zhili Liu, Yunhao Gou, Kai Chen, Lanqing Hong, Jiahui Gao, Fei Mi, Yu Zhang, Zhenguo Li, Xin Jiang, Qun Liu, James T. Kwok

    Abstract: As the capabilities of large language models (LLMs) continue to expand, aligning these models with human values remains a significant challenge. Recent studies show that reasoning abilities contribute significantly to model safety, while integrating Mixture-of-Experts (MoE) architectures can further enhance alignment. In this work, we address a fundamental question: How to effectively incorporate… ▽ More

    Submitted 1 June, 2025; v1 submitted 1 May, 2024; originally announced May 2024.

  36. arXiv:2403.09572  [pdf, other

    cs.CV

    Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation

    Authors: Yunhao Gou, Kai Chen, Zhili Liu, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-Yan Yeung, James T. Kwok, Yu Zhang

    Abstract: Multimodal large language models (MLLMs) have shown impressive reasoning abilities. However, they are also more vulnerable to jailbreak attacks than their LLM predecessors. Although still capable of detecting the unsafe responses, we observe that safety mechanisms of the pre-aligned LLMs in MLLMs can be easily bypassed with the introduction of image features. To construct robust MLLMs, we propose… ▽ More

    Submitted 15 October, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: ECCV2024 (Project Page: https://gyhdog99.github.io/projects/ecso/)

  37. arXiv:2402.05382  [pdf, other

    cs.CV cs.LG

    Task-customized Masked AutoEncoder via Mixture of Cluster-conditional Experts

    Authors: Zhili Liu, Kai Chen, Jianhua Han, Lanqing Hong, Hang Xu, Zhenguo Li, James T. Kwok

    Abstract: Masked Autoencoder~(MAE) is a prevailing self-supervised learning method that achieves promising results in model pre-training. However, when the various downstream tasks have data distributions different from the pre-training data, the semantically irrelevant pre-training information might result in negative transfer, impeding MAE's scalability. To address this issue, we propose a novel MAE-based… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: Accepted by ICLR 2023

  38. KICGPT: Large Language Model with Knowledge in Context for Knowledge Graph Completion

    Authors: Yanbin Wei, Qiushi Huang, James T. Kwok, Yu Zhang

    Abstract: Knowledge Graph Completion (KGC) is crucial for addressing knowledge graph incompleteness and supporting downstream applications. Many models have been proposed for KGC. They can be categorized into two main classes: triple-based and text-based approaches. Triple-based methods struggle with long-tail entities due to limited structural information and imbalanced entity distributions. Text-based met… ▽ More

    Submitted 23 February, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: Accepted to EMNLP 2023 Findings

  39. arXiv:2402.02130  [pdf, other

    cs.CL

    GITA: Graph to Visual and Textual Integration for Vision-Language Graph Reasoning

    Authors: Yanbin Wei, Shuai Fu, Weisen Jiang, Zejian Zhang, Zhixiong Zeng, Qi Wu, James T. Kwok, Yu Zhang

    Abstract: Large Language Models (LLMs) are increasingly used for various tasks with graph structures. Though LLMs can process graph information in a textual format, they overlook the rich vision modality, which is an intuitive way for humans to comprehend structural information and conduct general graph reasoning. The potential benefits and capabilities of representing graph structures as visual images (i.e… ▽ More

    Submitted 31 October, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

    Comments: NeurIPS 2024; Project Page: v-graph.github.io; Code: https://github.com/WEIYanbin1999/GITA/

  40. Compositional Oil Spill Detection Based on Object Detector and Adapted Segment Anything Model from SAR Images

    Authors: Wenhui Wu, Man Sing Wong, Xinyu Yu, Guoqiang Shi, Coco Yin Tung Kwok, Kang Zou

    Abstract: Semantic segmentation-based methods have attracted extensive attention in oil spill detection from SAR images. However, the existing approaches require a large number of finely annotated segmentation samples in the training stage. To alleviate this issue, we propose a composite oil spill detection framework, SAM-OIL, comprising an object detector (e.g., YOLOv8), an Adapted Segment Anything Model (… ▽ More

    Submitted 22 December, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    Comments: 5 pages, 4 figures, published to IEEE Geoscience and Remote Sensing Letters

    Journal ref: IEEE Geoscience and Remote Sensing Letters, vol. 21, pp. 1-5, 2024, Art no. 4007505

  41. arXiv:2312.12379  [pdf, other

    cs.CV

    Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning

    Authors: Yunhao Gou, Zhili Liu, Kai Chen, Lanqing Hong, Hang Xu, Aoxue Li, Dit-Yan Yeung, James T. Kwok, Yu Zhang

    Abstract: Instruction tuning of Large Vision-language Models (LVLMs) has revolutionized the development of versatile models with zero-shot generalization across a wide range of downstream vision-language tasks. However, the diversity of training tasks of different sources and formats would lead to inevitable task conflicts, where different tasks conflict for the same set of model parameters, resulting in su… ▽ More

    Submitted 3 July, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: Project website: https://gyhdog99.github.io/projects/mocle/

  42. arXiv:2311.05936  [pdf, ps, other

    cs.LG

    Aggregation Weighting of Federated Learning via Generalization Bound Estimation

    Authors: Mingwei Xu, Xiaofeng Cao, Ivor W. Tsang, James T. Kwok

    Abstract: Federated Learning (FL) typically aggregates client model parameters using a weighting approach determined by sample proportions. However, this naive weighting method may lead to unfairness and degradation in model performance due to statistical heterogeneity and the inclusion of noisy data among clients. Theoretically, distributional robustness analysis has shown that the generalization performan… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  43. arXiv:2310.15301  [pdf, other

    cs.LG

    ADMarker: A Multi-Modal Federated Learning System for Monitoring Digital Biomarkers of Alzheimer's Disease

    Authors: Xiaomin Ouyang, Xian Shuai, Yang Li, Li Pan, Xifan Zhang, Heming Fu, Sitong Cheng, Xinyan Wang, Shihua Cao, Jiang Xin, Hazel Mok, Zhenyu Yan, Doris Sau Fung Yu, Timothy Kwok, Guoliang Xing

    Abstract: Alzheimer's Disease (AD) and related dementia are a growing global health challenge due to the aging population. In this paper, we present ADMarker, the first end-to-end system that integrates multi-modal sensors and new federated learning algorithms for detecting multidimensional AD digital biomarkers in natural living environments. ADMarker features a novel three-stage multi-modal federated lear… ▽ More

    Submitted 12 April, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

  44. arXiv:2310.01886  [pdf, other

    cs.LG cs.CL cs.CV

    BYOM: Building Your Own Multi-Task Model For Free

    Authors: Weisen Jiang, Baijiong Lin, Han Shi, Yu Zhang, Zhenguo Li, James T. Kwok

    Abstract: Recently, various merging methods have been proposed to build a multi-task model from task-specific finetuned models without retraining. However, existing methods suffer from a large performance deterioration compared to using multiple task-specific models. In this paper, we propose to inject task-specific knowledge into the merged model and design two parameter-efficient approaches (BYOM-FFT and… ▽ More

    Submitted 3 February, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: Technical Report

  45. arXiv:2309.14360  [pdf, other

    cs.LG cs.CV

    Domain-Guided Conditional Diffusion Model for Unsupervised Domain Adaptation

    Authors: Yulong Zhang, Shuhao Chen, Weisen Jiang, Yu Zhang, Jiangang Lu, James T. Kwok

    Abstract: Limited transferability hinders the performance of deep learning models when applied to new application scenarios. Recently, Unsupervised Domain Adaptation (UDA) has achieved significant progress in addressing this issue via learning domain-invariant features. However, the performance of existing UDA methods is constrained by the large domain shift and limited target domain data. To alleviate thes… ▽ More

    Submitted 23 September, 2023; originally announced September 2023.

    Comments: Work in progress

  46. arXiv:2309.12284  [pdf, other

    cs.CL cs.AI

    MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

    Authors: Longhui Yu, Weisen Jiang, Han Shi, Jincheng Yu, Zhengying Liu, Yu Zhang, James T. Kwok, Zhenguo Li, Adrian Weller, Weiyang Liu

    Abstract: Large language models (LLMs) have pushed the limits of natural language understanding and exhibited excellent problem-solving ability. Despite the great success, most existing open-source LLMs (e.g., LLaMA-2) are still far away from satisfactory for solving mathematical problem due to the complex reasoning procedures. To bridge this gap, we propose MetaMath, a fine-tuned language model that specia… ▽ More

    Submitted 3 May, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

    Comments: To appear at ICLR 2024 (Spotlight). Project Page: https://meta-math.github.io/

  47. arXiv:2308.12029  [pdf, other

    cs.LG cs.AI

    Dual-Balancing for Multi-Task Learning

    Authors: Baijiong Lin, Weisen Jiang, Feiyang Ye, Yu Zhang, Pengguang Chen, Ying-Cong Chen, Shu Liu, James T. Kwok

    Abstract: Multi-task learning (MTL), a learning paradigm to learn multiple related tasks simultaneously, has achieved great success in various fields. However, task balancing problem remains a significant challenge in MTL, with the disparity in loss/gradient scales often leading to performance compromises. In this paper, we propose a Dual-Balancing Multi-Task Learning (DB-MTL) method to alleviate the task b… ▽ More

    Submitted 29 September, 2023; v1 submitted 23 August, 2023; originally announced August 2023.

    Comments: Technical Report

  48. arXiv:2308.07758  [pdf, other

    cs.CL cs.AI cs.LG

    Forward-Backward Reasoning in Large Language Models for Mathematical Verification

    Authors: Weisen Jiang, Han Shi, Longhui Yu, Zhengying Liu, Yu Zhang, Zhenguo Li, James T. Kwok

    Abstract: Self-Consistency samples diverse reasoning chains with answers and chooses the final answer by majority voting. It is based on forward reasoning and cannot further improve performance by sampling more reasoning chains when saturated. To further boost performance, we introduce backward reasoning to verify candidate answers. Specifically, for mathematical tasks, we mask a number in the question and… ▽ More

    Submitted 4 June, 2024; v1 submitted 15 August, 2023; originally announced August 2023.

    Comments: Accepted by Findings of ACL 2024

  49. arXiv:2306.05675  [pdf, other

    cs.CV

    Illumination Controllable Dehazing Network based on Unsupervised Retinex Embedding

    Authors: Jie Gui, Xiaofeng Cong, Lei He, Yuan Yan Tang, James Tin-Yau Kwok

    Abstract: On the one hand, the dehazing task is an illposedness problem, which means that no unique solution exists. On the other hand, the dehazing task should take into account the subjective factor, which is to give the user selectable dehazed images rather than a single result. Therefore, this paper proposes a multi-output dehazing network by introducing illumination controllable ability, called IC-Deha… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

  50. arXiv:2306.00618  [pdf, other

    cs.CL cs.AI cs.LG

    Effective Structured Prompting by Meta-Learning and Representative Verbalizer

    Authors: Weisen Jiang, Yu Zhang, James T. Kwok

    Abstract: Prompt tuning for pre-trained masked language models (MLM) has shown promising performance in natural language processing tasks with few labeled examples. It tunes a prompt for the downstream task, and a verbalizer is used to bridge the predicted token and label prediction. Due to the limited training data, prompt initialization is crucial for prompt tuning. Recently, MetaPrompting (Hou et al., 20… ▽ More

    Submitted 21 March, 2024; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Accepted at ICML 2023