Skip to main content

Showing 1–50 of 593 results for author: Wong, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.08205  [pdf, ps, other

    cs.CV

    HNOSeg-XS: Extremely Small Hartley Neural Operator for Efficient and Resolution-Robust 3D Image Segmentation

    Authors: Ken C. L. Wong, Hongzhi Wang, Tanveer Syeda-Mahmood

    Abstract: In medical image segmentation, convolutional neural networks (CNNs) and transformers are dominant. For CNNs, given the local receptive fields of convolutional layers, long-range spatial correlations are captured through consecutive convolutions and pooling. However, as the computational cost and memory footprint can be prohibitively large, 3D models can only afford fewer layers than 2D models with… ▽ More

    Submitted 10 July, 2025; originally announced July 2025.

    Comments: This paper was accepted by IEEE TMI 2025

  2. arXiv:2507.04903  [pdf, ps, other

    cs.CR cs.AI cs.DC

    BackFed: An Efficient & Standardized Benchmark Suite for Backdoor Attacks in Federated Learning

    Authors: Thinh Dao, Dung Thuy Nguyen, Khoa D Doan, Kok-Seng Wong

    Abstract: Federated Learning (FL) systems are vulnerable to backdoor attacks, where adversaries train their local models on poisoned data and submit poisoned model updates to compromise the global model. Despite numerous proposed attacks and defenses, divergent experimental settings, implementation errors, and unrealistic assumptions hinder fair comparisons and valid conclusions about their effectiveness in… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: Under review at NeurIPS'25

  3. arXiv:2507.03351  [pdf, ps, other

    eess.SP cs.IT

    Specific Absorption Rate-Aware Multiuser MIMO Assisted by Fluid Antenna System

    Authors: Yuqi Ye, Li You, Hao Xu, Ahmed Elzanaty, Kai-Kit Wong, Xiqi Gao

    Abstract: With the development of the upcoming sixth-generation (6G) wireless networks, there is a pressing need for innovative technologies capable of satisfying heightened performance indicators. Fluid antenna system (FAS) is proposed recently as a promising technique to achieve higher data rates and more diversity gains by dynamically changing the positions of the antennas to form a more desirable channe… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

    Comments: 12 pages, 9 figures, to appear in IEEE Transactions on Wireless Communications

  4. arXiv:2507.03133  [pdf, ps, other

    cs.CL

    ReliableMath: Benchmark of Reliable Mathematical Reasoning on Large Language Models

    Authors: Boyang Xue, Qi Zhu, Rui Wang, Sheng Wang, Hongru Wang, Fei Mi, Yasheng Wang, Lifeng Shang, Qun Liu, Kam-Fai Wong

    Abstract: Although demonstrating remarkable performance on reasoning tasks, Large Language Models (LLMs) still tend to fabricate unreliable responses when confronted with problems that are unsolvable or beyond their capability, severely undermining the reliability. Prior studies of LLM reliability have primarily focused on knowledge tasks to identify unanswerable questions, while mathematical reasoning task… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

    Comments: under review

  5. arXiv:2506.20235  [pdf, ps, other

    cs.LG cs.AI

    Directed Link Prediction using GNN with Local and Global Feature Fusion

    Authors: Yuyang Zhang, Xu Shen, Yu Xie, Ka-Chun Wong, Weidun Xie, Chengbin Peng

    Abstract: Link prediction is a classical problem in graph analysis with many practical applications. For directed graphs, recently developed deep learning approaches typically analyze node similarities through contrastive learning and aggregate neighborhood information through graph convolutions. In this work, we propose a novel graph neural network (GNN) framework to fuse feature embedding with community i… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  6. arXiv:2506.20048  [pdf, ps, other

    stat.ML cs.LG

    A Principled Path to Fitted Distributional Evaluation

    Authors: Sungee Hong, Jiayi Wang, Zhengling Qi, Raymond Ka Wai Wong

    Abstract: In reinforcement learning, distributional off-policy evaluation (OPE) focuses on estimating the return distribution of a target policy using offline data collected under a different policy. This work focuses on extending the widely used fitted-Q evaluation -- developed for expectation-based reinforcement learning -- to the distributional OPE setting. We refer to this extension as fitted distributi… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  7. arXiv:2506.16578  [pdf, ps, other

    cs.CV

    SafeTriage: Facial Video De-identification for Privacy-Preserving Stroke Triage

    Authors: Tongan Cai, Haomiao Ni, Wenchao Ma, Yuan Xue, Qian Ma, Rachel Leicht, Kelvin Wong, John Volpi, Stephen T. C. Wong, James Z. Wang, Sharon X. Huang

    Abstract: Effective stroke triage in emergency settings often relies on clinicians' ability to identify subtle abnormalities in facial muscle coordination. While recent AI models have shown promise in detecting such patterns from patient facial videos, their reliance on real patient data raises significant ethical and privacy challenges -- especially when training robust and generalizable models across inst… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: IPMI 2025

  8. arXiv:2506.14488  [pdf, ps, other

    q-bio.BM cs.LG

    Reimagining Target-Aware Molecular Generation through Retrieval-Enhanced Aligned Diffusion

    Authors: Dong Xu, Zhangfan Yang, Ka-chun Wong, Zexuan Zhu, Jiangqiang Li, Junkai Ji

    Abstract: Breakthroughs in high-accuracy protein structure prediction, such as AlphaFold, have established receptor-based molecule design as a critical driver for rapid early-phase drug discovery. However, most approaches still struggle to balance pocket-specific geometric fit with strict valence and synthetic constraints. To resolve this trade-off, a Retrieval-Enhanced Aligned Diffusion termed READ is intr… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: 13 pages, 5 figures

  9. arXiv:2506.14288  [pdf, ps, other

    cs.IT

    Large Language Model Empowered Design of Fluid Antenna Systems: Challenges, Frameworks, and Case Studies for 6G

    Authors: Chao Wang, Kai-Kit Wong, Zan Li, Liang Jin, Chan-Byoung Chae

    Abstract: The Fluid Antenna System (FAS), which enables flexible Multiple-Input Multiple-Output (MIMO) communications, introduces new spatial degrees of freedom for next-generation wireless networks. Unlike traditional MIMO, FAS involves joint port selection and precoder design, a combinatorial NP-hard optimization problem. Moreover, fully leveraging FAS requires acquiring Channel State Information (CSI) ac… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: 9 pages

    MSC Class: fluid antenna system

  10. arXiv:2506.13317  [pdf, ps, other

    cs.IT eess.SP

    A Contemporary Survey on Fluid Antenna Systems: Fundamentals and Networking Perspectives

    Authors: Hanjiang Hong, Kai-Kit Wong, Hao Xu, Xinghao Guo, Farshad Rostami Ghadi, Yu Chen, Yin Xu, Chan-Byoung Chae, Baiyang Liu, Kin-Fai Tong, Yangyang Zhang

    Abstract: The explosive growth of teletraffic, fueled by the convergence of cyber-physical systems and data-intensive applications, such as the Internet of Things (IoT), autonomous systems, and immersive communications, demands a multidisciplinary suite of innovative solutions across the physical and network layers. Fluid antenna systems (FAS) represent a transformative advancement in antenna design, offeri… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  11. arXiv:2506.07986  [pdf, ps, other

    cs.CV

    Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

    Authors: Zhengyao Lv, Tianlin Pan, Chenyang Si, Zhaoxi Chen, Wangmeng Zuo, Ziwei Liu, Kwan-Yee K. Wong

    Abstract: Multimodal Diffusion Transformers (MM-DiTs) have achieved remarkable progress in text-driven visual generation. However, even state-of-the-art MM-DiT models like FLUX struggle with achieving precise alignment between text prompts and generated content. We identify two key issues in the attention mechanism of MM-DiT, namely 1) the suppression of cross-modal attention due to token imbalance between… ▽ More

    Submitted 11 June, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

    Comments: Project Page: https://vchitect.github.io/TACA/

  12. arXiv:2506.07362  [pdf, ps, other

    cs.IT eess.SP

    Fluid Antenna-Empowered Receive Spatial Modulation

    Authors: Xinghao Guo, Yin Xu, Dazhi He, Cixiao Zhang, Hanjiang Hong, Kai-Kit Wong, Chan-Byoung Chae, Wenjun Zhang, Yiyan Wu

    Abstract: Fluid antenna (FA), as an emerging antenna technology, fully exploits spatial diversity. This paper integrates FA with the receive spatial modulation (RSM) scheme and proposes a novel FA-empowered RSM (FA-RSM) system. In this system, the transmitter is equipped with an FA that simultaneously activates multiple ports to transmit precoded signals. We address three key challenges in the FA-RSM system… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

    Comments: 12 pages, submitted to IEEE Journal

  13. arXiv:2506.05569  [pdf, ps, other

    cs.IT eess.SP

    Fluid Antenna System-Assisted Self-Interference Cancellation for In-Band Full Duplex Communications

    Authors: Hanjiang Hong, Kai-Kit Wong, Hao Xu, Yiyan Wu, Sai Xu, Chan-Byoung Chae, Baiyang Liu, Kin-Fai Tong

    Abstract: In-band full-duplex (IBFD) systems are expected to double the spectral efficiency compared to half-duplex systems, provided that loopback self-interference (SI) can be effectively suppressed. The inherent interference mitigation capabilities of the emerging fluid antenna system (FAS) technology make it a promising candidate for addressing the SI challenge in IBFD systems. This paper thus proposes… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  14. arXiv:2506.03850  [pdf, ps, other

    cs.LG

    Vulnerability-Aware Alignment: Mitigating Uneven Forgetting in Harmful Fine-Tuning

    Authors: Liang Chen, Xueting Han, Li Shen, Jing Bai, Kam-Fai Wong

    Abstract: Harmful fine-tuning (HFT), performed directly on open-source LLMs or through Fine-tuning-as-a-Service, breaks safety alignment and poses significant threats. Existing methods aim to mitigate HFT risks by learning robust representation on alignment data or making harmful data unlearnable, but they treat each data sample equally, leaving data vulnerability patterns understudied. In this work, we rev… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: ICML 2025

  15. arXiv:2506.03123  [pdf, ps, other

    cs.CV

    DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation

    Authors: Zhengyao Lv, Chenyang Si, Tianlin Pan, Zhaoxi Chen, Kwan-Yee K. Wong, Yu Qiao, Ziwei Liu

    Abstract: Diffusion Models have achieved remarkable results in video synthesis but require iterative denoising steps, leading to substantial computational overhead. Consistency Models have made significant progress in accelerating diffusion models. However, directly applying them to video diffusion models often results in severe degradation of temporal consistency and appearance details. In this paper, by a… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  16. arXiv:2506.00886  [pdf, ps, other

    cs.AI

    Toward a Theory of Agents as Tool-Use Decision-Makers

    Authors: Hongru Wang, Cheng Qian, Manling Li, Jiahao Qiu, Boyang Xue, Mengdi Wang, Heng Ji, Kam-Fai Wong

    Abstract: As Large Language Models (LLMs) evolve into increasingly autonomous agents, fundamental questions about their epistemic foundations remain unresolved: What defines an agent? How should it make decisions? And what objectives should guide its behavior? In this position paper, we argue that true autonomy requires agents to be grounded in a coherent epistemic framework that governs what they know, wha… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  17. arXiv:2505.24873  [pdf, ps, other

    cs.CV

    MiniMax-Remover: Taming Bad Noise Helps Video Object Removal

    Authors: Bojia Zi, Weixuan Peng, Xianbiao Qi, Jianan Wang, Shihao Zhao, Rong Xiao, Kam-Fai Wong

    Abstract: Recent advances in video diffusion models have driven rapid progress in video editing techniques. However, video object removal, a critical subtask of video editing, remains challenging due to issues such as hallucinated objects and visual artifacts. Furthermore, existing methods often rely on computationally expensive sampling procedures and classifier-free guidance (CFG), resulting in slow infer… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  18. arXiv:2505.23680  [pdf, ps, other

    cs.IT eess.SP

    Performance Analysis of Wireless Communication Systems Assisted by Fluid Reconfigurable Intelligent Surfaces

    Authors: Farshad Rostami Ghadi, Kai-Kit Wong, F. Javier Lopez-Martinez, George C. Alexandropoulos, Chan-Byoung Chae

    Abstract: This letter investigates the performance of emerging wireless communication systems assisted by a fluid reconfigurable intelligent surface (FRIS). Unlike conventional reconfigurable intelligent surfaces (RISs), an FRIS consists of fluid-inspired metamaterials arranged in a densely packed matrix of sub-elements over a surface. It dynamically activates specific elements for signal reflection and mod… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  19. arXiv:2505.20231  [pdf, ps, other

    cs.CL

    Bridging the Long-Term Gap: A Memory-Active Policy for Multi-Session Task-Oriented Dialogue

    Authors: Yiming Du, Bingbing Wang, Yang He, Bin Liang, Baojun Wang, Zhongyang Li, Lin Gui, Jeff Z. Pan, Ruifeng Xu, Kam-Fai Wong

    Abstract: Existing Task-Oriented Dialogue (TOD) systems primarily focus on single-session dialogues, limiting their effectiveness in long-term memory augmentation. To address this challenge, we introduce a MS-TOD dataset, the first multi-session TOD dataset designed to retain long-term memory across sessions, enabling fewer turns and more efficient task completion. This defines a new benchmark task for eval… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  20. arXiv:2505.18628  [pdf, ps, other

    cs.IT

    Multi-Subarray FD-RIS Enhanced Multi-user Wireless Networks: With Joint Distance-Angle Beamforming

    Authors: Han Xiao, Xiaoyan Hu, Wenjie Wang, Kai-Kit Wong, Kun Yang, Shi Jin

    Abstract: The concept of the frequency diverse reconfigurable intelligent surface (FD-RIS) technology has been introduced, which can enable simultaneous implementation of distance-angle beamforming in far-field communication scenarios. In order to improve the managing ability on undesired harmonic signals and the diversity of frequency offsets, this paper presents a novel multi-subarray FD-RIS framework. In… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

  21. arXiv:2505.17829  [pdf, other

    cs.CL

    Stepwise Reasoning Checkpoint Analysis: A Test Time Scaling Method to Enhance LLMs' Reasoning

    Authors: Zezhong Wang, Xingshan Zeng, Weiwen Liu, Yufei Wang, Liangyou Li, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu, Kam-Fai Wong

    Abstract: Mathematical reasoning through Chain-of-Thought (CoT) has emerged as a powerful capability of Large Language Models (LLMs), which can be further enhanced through Test-Time Scaling (TTS) methods like Beam Search and DVTS. However, these methods, despite improving accuracy by allocating more computational resources during inference, often suffer from path homogenization and inefficient use of interm… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  22. arXiv:2505.17448  [pdf, ps, other

    cs.LG cs.CV

    Baitradar: A Multi-Model Clickbait Detection Algorithm Using Deep Learning

    Authors: Bhanuka Gamage, Adnan Labib, Aisha Joomun, Chern Hong Lim, KokSheik Wong

    Abstract: Following the rising popularity of YouTube, there is an emerging problem on this platform called clickbait, which provokes users to click on videos using attractive titles and thumbnails. As a result, users ended up watching a video that does not have the content as publicized in the title. This issue is addressed in this study by proposing an algorithm called BaitRadar, which uses a deep learning… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: Appear in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'21), Toronto, ON, Canada

  23. arXiv:2505.17433  [pdf, ps, other

    cs.AI

    MemeReaCon: Probing Contextual Meme Understanding in Large Vision-Language Models

    Authors: Zhengyi Zhao, Shubo Zhang, Yuxi Zhang, Yanxi Zhao, Yifan Zhang, Zezhong Wang, Huimin Wang, Yutian Zhao, Bin Liang, Yefeng Zheng, Binyang Li, Kam-Fai Wong, Xian Wu

    Abstract: Memes have emerged as a popular form of multimodal online communication, where their interpretation heavily depends on the specific context in which they appear. Current approaches predominantly focus on isolated meme analysis, either for harmful content detection or standalone interpretation, overlooking a fundamental challenge: the same meme can express different intents depending on its convers… ▽ More

    Submitted 4 June, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

  24. arXiv:2505.17427  [pdf, ps, other

    cs.CL

    T$^2$: An Adaptive Test-Time Scaling Strategy for Contextual Question Answering

    Authors: Zhengyi Zhao, Shubo Zhang, Zezhong Wang, Huimin Wang, Yutian Zhao, Bin Liang, Yefeng Zheng, Binyang Li, Kam-Fai Wong, Xian Wu

    Abstract: Recent advances in Large Language Models (LLMs) have demonstrated remarkable performance in Contextual Question Answering (CQA). However, prior approaches typically employ elaborate reasoning strategies regardless of question complexity, leading to low adaptability. Recent efficient test-time scaling methods introduce budget constraints or early stop mechanisms to avoid overthinking for straightfo… ▽ More

    Submitted 4 June, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: arXiv admin note: substantial text overlap with arXiv:2503.22985

  25. arXiv:2505.15200  [pdf, ps, other

    cs.IT eess.SP

    Performance Analysis of Fluid Antenna System under Spatially-Correlated Rician Fading Channels

    Authors: Jiangsheng Huangfu, Zhengyu Song, Tianwei Hou, Anna Li, Yuanwei Liu, Arumugam Nallanathan, Kai-Kit Wong

    Abstract: Fluid antenna systems (FAS) are among the most promising technologies for the sixth generation (6G) mobile communication networks. Unlike traditional fixed-position multiple-input multiple-output (MIMO) systems, a FAS possesses position reconfigurability to switch on-demand among $N$ predefined ports over a prescribed space. This paper explores the performance of a single-input single-output (SISO… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  26. arXiv:2505.14116  [pdf, ps, other

    cs.CL

    Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst

    Authors: Hongru Wang, Deng Cai, Wanjun Zhong, Shijue Huang, Jeff Z. Pan, Zeming Liu, Kam-Fai Wong

    Abstract: Inference-time scaling has attracted much attention which significantly enhance the performance of Large Language Models (LLMs) in complex reasoning tasks by increasing the length of Chain-of-Thought. These longer intermediate reasoning rationales embody various meta-reasoning skills in human cognition, such as reflection and decomposition, being difficult to create and acquire. In this work, we i… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  27. arXiv:2505.13616  [pdf, other

    cs.IT eess.SP

    FIRES: Fluid Integrated Reflecting and Emitting Surfaces

    Authors: Farshad Rostami Ghadi, Kai-Kit Wong, Masoud Kaveh, F. Javier Lopez-Martinez, Chan-Byoung Chae, George C. Alexandropoulos

    Abstract: This letter introduces the concept of fluid integrated reflecting and emitting surface (FIRES), which constitutes a new paradigm seamlessly integrating the flexibility of fluid-antenna systems (FASs) with the dual functionality of simultaneous transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs). The potential of the proposed metasurface structure is studied though an FIRES-… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  28. arXiv:2505.13328  [pdf, other

    cs.CL

    Rethinking Stateful Tool Use in Multi-Turn Dialogues: Benchmarks and Challenges

    Authors: Hongru Wang, Wenyu Huang, Yufei Wang, Yuanhao Xi, Jianqiao Lu, Huan Zhang, Nan Hu, Zeming Liu, Jeff Z. Pan, Kam-Fai Wong

    Abstract: Existing benchmarks that assess Language Models (LMs) as Language Agents (LAs) for tool use primarily focus on stateless, single-turn interactions or partial evaluations, such as tool selection in a single turn, overlooking the inherent stateful nature of interactions in multi-turn applications. To fulfill this gap, we propose \texttt{DialogTool}, a multi-turn dialogue dataset with stateful tool i… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  29. arXiv:2505.08838  [pdf, ps, other

    eess.IV cs.AI cs.CV

    Ultrasound Report Generation with Multimodal Large Language Models for Standardized Texts

    Authors: Peixuan Ge, Tongkun Su, Faqin Lv, Baoliang Zhao, Peng Zhang, Chi Hong Wong, Liang Yao, Yu Sun, Zenan Wang, Pak Kin Wong, Ying Hu

    Abstract: Ultrasound (US) report generation is a challenging task due to the variability of US images, operator dependence, and the need for standardized text. Unlike X-ray and CT, US imaging lacks consistent datasets, making automation difficult. In this study, we propose a unified framework for multi-organ and multilingual US report generation, integrating fragment-based multilingual training and leveragi… ▽ More

    Submitted 19 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

  30. arXiv:2505.03841  [pdf, other

    cs.RO eess.SY

    Contact-Aware Safety in Soft Robots Using High-Order Control Barrier and Lyapunov Functions

    Authors: Kiwan Wong, Maximilian Stölzle, Wei Xiao, Cosimo Della Santina, Daniela Rus, Gioele Zardini

    Abstract: Robots operating alongside people, particularly in sensitive scenarios such as aiding the elderly with daily tasks or collaborating with workers in manufacturing, must guarantee safety and cultivate user trust. Continuum soft manipulators promise safety through material compliance, but as designs evolve for greater precision, payload capacity, and speed, and increasingly incorporate rigid elements… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: 8 pages

  31. arXiv:2505.02344  [pdf, other

    cs.CR

    An End-to-End Model For Logits Based Large Language Models Watermarking

    Authors: Kahim Wong, Jicheng Zhou, Jiantao Zhou, Yain-Whar Si

    Abstract: The rise of LLMs has increased concerns over source tracing and copyright protection for AIGC, highlighting the need for advanced detection technologies. Passive detection methods usually face high false positives, while active watermarking techniques using logits or sampling manipulation offer more effective protection. Existing LLM watermarking methods, though effective on unaltered content, suf… ▽ More

    Submitted 22 May, 2025; v1 submitted 4 May, 2025; originally announced May 2025.

  32. arXiv:2505.01988  [pdf, ps, other

    cs.IT

    Sparse Code Transceiver Design for Unsourced Random Access with Analytical Power Division in Gaussian MAC

    Authors: Zhentian Zhang, Mohammad Javad Ahmadi, Jian Dang, Kai-Kit Wong, Zaichen Zhang, Christos Masouros

    Abstract: In this work, we discuss the problem of unsourced random access (URA) over a Gaussian multiple access channel (GMAC). To address the challenges posed by emerging massive machine-type connectivity, URA reframes multiple access as a coding-theoretic problem. The sparse code-oriented schemes are highly valued because they are widely used in existing protocols, making their implementation require only… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

  33. arXiv:2505.01458  [pdf, other

    cs.RO cs.AI

    A Survey of Robotic Navigation and Manipulation with Physics Simulators in the Era of Embodied AI

    Authors: Lik Hang Kenny Wong, Xueyang Kang, Kaixin Bai, Jianwei Zhang

    Abstract: Navigation and manipulation are core capabilities in Embodied AI, yet training agents with these capabilities in the real world faces high costs and time complexity. Therefore, sim-to-real transfer has emerged as a key approach, yet the sim-to-real gap persists. This survey examines how physics simulators address this gap by analyzing their properties overlooked in previous surveys. We also analyz… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  34. arXiv:2505.00675  [pdf, other

    cs.CL

    Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions

    Authors: Yiming Du, Wenyu Huang, Danna Zheng, Zhaowei Wang, Sebastien Montella, Mirella Lapata, Kam-Fai Wong, Jeff Z. Pan

    Abstract: Memory is a fundamental component of AI systems, underpinning large language models (LLMs)-based agents. While prior surveys have focused on memory applications with LLMs (e.g., enabling personalized memory in conversational agents), they often overlook the atomic operations that underlie memory dynamics. In this survey, we first categorize memory representations into parametric and contextual for… ▽ More

    Submitted 27 May, 2025; v1 submitted 1 May, 2025; originally announced May 2025.

  35. arXiv:2504.19162  [pdf, other

    cs.CL cs.AI cs.LG

    SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning

    Authors: Jiaqi Chen, Bang Zhang, Ruotian Ma, Peisong Wang, Xiaodan Liang, Zhaopeng Tu, Xiaolong Li, Kwan-Yee K. Wong

    Abstract: Evaluating the step-by-step reliability of large language model (LLM) reasoning, such as Chain-of-Thought, remains challenging due to the difficulty and cost of obtaining high-quality step-level supervision. In this paper, we introduce Self-Play Critic (SPC), a novel approach where a critic model evolves its ability to assess reasoning steps through adversarial self-play games, eliminating the nee… ▽ More

    Submitted 17 May, 2025; v1 submitted 27 April, 2025; originally announced April 2025.

    Comments: Project webpage: https://chen-judge.github.io/SPC/

  36. arXiv:2504.17629  [pdf, ps, other

    cs.IT

    Integrated Sensing and Communications for Unsourced Random Access: A Spectrum Sharing Compressive Sensing Approach

    Authors: Zhentian Zhang, Jian Dang, Kai-Kit Wong, Zaichen Zhang, Christos Masouros

    Abstract: This paper addresses the unsourced/uncoordinated random access problem in an integrated sensing and communications (ISAC) system, with a focus on uplink multiple access code design. Recent theoretical advancements highlight that an ISAC system will be overwhelmed by the increasing number of active devices, driven by the growth of massive machine-type communication (mMTC). To meet the demands of fu… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  37. arXiv:2504.15093  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Rethinking the Potential of Multimodality in Collaborative Problem Solving Diagnosis with Large Language Models

    Authors: K. Wong, B. Wu, S. Bulathwela, M. Cukurova

    Abstract: Detecting collaborative and problem-solving behaviours from digital traces to interpret students' collaborative problem solving (CPS) competency is a long-term goal in the Artificial Intelligence in Education (AIEd) field. Although multimodal data and advanced models are argued to have the potential to detect complex CPS behaviours, empirical evidence on their value remains limited with some contr… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: Accepted for 26th International Conference on Artificial Intelligence in Education (AIED 2025), 22 - 26 July 2025, Palermo, Italy. 17 pages, 1 figure

  38. arXiv:2504.14870  [pdf, ps, other

    cs.AI cs.CL

    Acting Less is Reasoning More! Teaching Model to Act Efficiently

    Authors: Hongru Wang, Cheng Qian, Wanjun Zhong, Xiusi Chen, Jiahao Qiu, Shijue Huang, Bowen Jin, Mengdi Wang, Kam-Fai Wong, Heng Ji

    Abstract: Tool-integrated reasoning (TIR) augments large language models (LLMs) with the ability to invoke external tools during long-form reasoning, such as search engines and code interpreters, to solve tasks beyond the capabilities of internal reasoning. While reinforcement learning (RL) has shown promise in training such agents, most of existing approaches typically optimize only for final correctness w… ▽ More

    Submitted 31 May, 2025; v1 submitted 21 April, 2025; originally announced April 2025.

  39. arXiv:2504.14653  [pdf, other

    cs.IT eess.SP

    Wireless Large AI Model: Shaping the AI-Native Future of 6G and Beyond

    Authors: Fenghao Zhu, Xinquan Wang, Xinyi Li, Maojun Zhang, Yixuan Chen, Chongwen Huang, Zhaohui Yang, Xiaoming Chen, Zhaoyang Zhang, Richeng Jin, Yongming Huang, Wei Feng, Tingting Yang, Baoming Bai, Feifei Gao, Kun Yang, Yuanwei Liu, Sami Muhaidat, Chau Yuen, Kaibin Huang, Kai-Kit Wong, Dusit Niyato, Mérouane Debbah

    Abstract: The emergence of sixth-generation and beyond communication systems is expected to fundamentally transform digital experiences through introducing unparalleled levels of intelligence, efficiency, and connectivity. A promising technology poised to enable this revolutionary vision is the wireless large AI model (WLAM), characterized by its exceptional capabilities in data processing, inference, and d… ▽ More

    Submitted 28 April, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

  40. arXiv:2504.06836  [pdf, other

    cs.CV

    Determining Fetal Orientations From Blind Sweep Ultrasound Video

    Authors: Jakub Maciej Wiśniewski, Anders Nymark Christensen, Mary Le Ngo, Martin Grønnebæk Tolsgaard, Chun Kit Wong

    Abstract: Cognitive demands of fetal ultrasound examinations pose unique challenges among clinicians. With the goal of providing an assistive tool, we developed an automated pipeline for predicting fetal orientation from ultrasound videos acquired following a simple blind sweep protocol. Leveraging on a pre-trained head detection and segmentation model, this is achieved by first determining the fetal presen… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: 10 pages

  41. arXiv:2504.04098  [pdf, ps, other

    cs.IR

    RIS-Empowered Integrated Location Sensing and Communication with Superimposed Pilots

    Authors: Wenchao Xia, Ben Zhao, Wankai Tang, Yongxu Zhu, Kai-Kit Wong, Sangarapillai Lambotharan, Hyundong Shin

    Abstract: In addition to enhancing wireless communication coverage quality, reconfigurable intelligent surface (RIS) technique can also assist in positioning. In this work, we consider RIS-assisted superimposed pilot and data transmission without the assumption availability of prior channel state information and position information of mobile user equipments (UEs). To tackle this challenge, we design a fram… ▽ More

    Submitted 5 April, 2025; originally announced April 2025.

  42. arXiv:2504.03183  [pdf, other

    cs.IT

    On Fundamental Limits for Fluid Antenna-assisted Integrated Sensing and Communications for Unsourced Random Access

    Authors: Zhentian Zhang, Kai-Kit Wong, Jian Dang, Zaichen Zhang, Chan-Byoung Chae

    Abstract: This paper investigates the unsourced random access (URA) problem for integrated sensing and commutations (ISAC). Recent results reveal that conventional multiple access strategies for ISAC such as treating interference as noise (TIN) and time-division multiple access (TDMA) can be easily overwhelmed and fail to support the increasingly surging number of active users. Hence, the unsourced ISAC (UN… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

  43. arXiv:2504.03128  [pdf, other

    cs.CV

    FontGuard: A Robust Font Watermarking Approach Leveraging Deep Font Knowledge

    Authors: Kahim Wong, Jicheng Zhou, Kemou Li, Yain-Whar Si, Xiaowei Wu, Jiantao Zhou

    Abstract: The proliferation of AI-generated content brings significant concerns on the forensic and security issues such as source tracing, copyright protection, etc, highlighting the need for effective watermarking technologies. Font-based text watermarking has emerged as an effective solution to embed information, which could ensure copyright, traceability, and compliance of the generated text content. Ex… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  44. SCNR Maximization for MIMO ISAC Assisted by Fluid Antenna System

    Authors: Yuqi Ye, Li You, Hao Xu, Ahmed Elzanaty, Kai-Kit Wong, Xiqi Gao

    Abstract: The integrated sensing and communication (ISAC) technology has been extensively researched to enhance communication rates and radar sensing capabilities. Additionally, a new technology known as fluid antenna system (FAS) has recently been proposed to obtain higher communication rates for future wireless networks by dynamically altering the antenna position to obtain a more favorable channel condit… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

    Comments: 6 Pages, 3 figures, to appear in IEEE Transactions on Vehicular Technology

  45. arXiv:2504.00906  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents

    Authors: Saaket Agashe, Kyle Wong, Vincent Tu, Jiachen Yang, Ang Li, Xin Eric Wang

    Abstract: Computer use agents automate digital tasks by directly interacting with graphical user interfaces (GUIs) on computers and mobile devices, offering significant potential to enhance human productivity by completing an open-ended space of user queries. However, current agents face significant challenges: imprecise grounding of GUI elements, difficulties with long-horizon task planning, and performanc… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: 18 pages, 13 figures, 8 tables

  46. arXiv:2503.24377  [pdf, other

    cs.CL cs.AI

    Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models

    Authors: Rui Wang, Hongru Wang, Boyang Xue, Jianhui Pang, Shudong Liu, Yi Chen, Jiahao Qiu, Derek Fai Wong, Heng Ji, Kam-Fai Wong

    Abstract: Recent advancements in Large Language Models (LLMs) have significantly enhanced their ability to perform complex reasoning tasks, transitioning from fast and intuitive thinking (System 1) to slow and deep reasoning (System 2). While System 2 reasoning improves task accuracy, it often incurs substantial computational costs due to its slow thinking nature and inefficient or unnecessary reasoning beh… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

    Comments: In Progress; Paper list Repo: https://github.com/DevoAllen/Awesome-Reasoning-Economy-Papers

  47. arXiv:2503.23673  [pdf, other

    cs.CL

    WHERE and WHICH: Iterative Debate for Biomedical Synthetic Data Augmentation

    Authors: Zhengyi Zhao, Shubo Zhang, Bin Liang, Binyang Li, Kam-Fai Wong

    Abstract: In Biomedical Natural Language Processing (BioNLP) tasks, such as Relation Extraction, Named Entity Recognition, and Text Classification, the scarcity of high-quality data remains a significant challenge. This limitation poisons large language models to correctly understand relationships between biological entities, such as molecules and diseases, or drug interactions, and further results in poten… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.

  48. arXiv:2503.23078  [pdf, other

    cs.CL

    EventWeave: A Dynamic Framework for Capturing Core and Supporting Events in Dialogue Systems

    Authors: Zhengyi Zhao, Shubo Zhang, Yiming Du, Bin Liang, Baojun Wang, Zhongyang Li, Binyang Li, Kam-Fai Wong

    Abstract: Existing large language models (LLMs) have shown remarkable progress in dialogue systems. However, many approaches still overlook the fundamental role of events throughout multi-turn interactions, leading to \textbf{incomplete context tracking}. Without tracking these events, dialogue systems often lose coherence and miss subtle shifts in user intent, causing disjointed responses. To bridge this g… ▽ More

    Submitted 29 March, 2025; originally announced March 2025.

  49. arXiv:2503.22985  [pdf, other

    cs.CL

    FReM: A Flexible Reasoning Mechanism for Balancing Quick and Slow Thinking in Long-Context Question Answering

    Authors: Zhengyi Zhao, Shubo Zhang, Zezhong Wang, Bin Liang, Binyang Li, Kam-Fai Wong

    Abstract: Long-context question-answering (LCQA) systems have greatly benefited from the powerful reasoning capabilities of large language models (LLMs), which can be categorized into slow and quick reasoning modes. However, both modes have their limitations. Slow thinking generally leans to explore every possible reasoning path, which leads to heavy overthinking and wastes time. Quick thinking usually reli… ▽ More

    Submitted 29 March, 2025; originally announced March 2025.

  50. arXiv:2503.16751  [pdf, other

    cs.IT eess.SP

    UAV-Relay Assisted RSMA Fluid Antenna System: Outage Probability Analysis

    Authors: Farshad Rostami Ghadi, Masoud Kaveh, Francisco Hernando-Gallego, Diego Martin, Kai-Kit Wong, Chan-Byoung Chae

    Abstract: This letter studies the impact of fluid antenna system (FAS) technology on the performance of unmanned aerial vehicle (UAV)-assisted multiuser communication networks. Specifically, we consider a scenario where a fixed-position antenna (FPA) base station (BS) serves K FAS-equipped users with the assistance of a UAV acting as an aerial relay. The BS employs rate-splitting multiple access (RSMA), whi… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.