-
Hybrid Near-Far Field 6D Movable Antenna Design Exploiting Directional Sparsity and Deep Learning
Authors:
Xiaodan Shao,
Limei Hu,
Yulong Sun,
Xing Li,
Yixiao Zhang,
Jingze Ding,
Xiaoming Shi,
Feng Chen,
Derrick Wing Kwan Ng,
Robert Schober
Abstract:
Six-dimensional movable antenna (6DMA) has been identified as a new disruptive technology for future wireless systems to support a large number of users with only a few antennas. However, the intricate relationships between the signal carrier wavelength and the transceiver region size lead to inaccuracies in traditional far-field 6DMA channel model, causing discrepancies between the model predicti…
▽ More
Six-dimensional movable antenna (6DMA) has been identified as a new disruptive technology for future wireless systems to support a large number of users with only a few antennas. However, the intricate relationships between the signal carrier wavelength and the transceiver region size lead to inaccuracies in traditional far-field 6DMA channel model, causing discrepancies between the model predictions and the hybrid-field channel characteristics in practical 6DMA systems, where users might be in the far-field region relative to the antennas on the same 6DMA surface, while simultaneously being in the near-field region relative to different 6DMA surfaces. Moreover, due to the high-dimensional channel and the coupled position and rotation constraints, the estimation of the 6DMA channel and the joint design of the 6DMA positions and rotations and the transmit beamforming at the base station (BS) incur extremely high computational complexity. To address these issues, we propose an efficient hybrid-field generalized 6DMA channel model, which accounts for planar-wave propagation within individual 6DMA surfaces and spherical-wave propagation among different 6DMA surfaces. Furthermore, by leveraging directional sparsity, we propose a low-overhead channel estimation algorithm that efficiently constructs a complete channel map for all potential antenna position-rotation pairs while limiting the training overhead incurred by antenna movement. In addition, we propose a low-complexity design leveraging deep reinforcement learning (DRL), which facilitates the joint design of the 6DMA positions, rotations, and beamforming in a unified manner. Numerical results demonstrate that the proposed hybrid-field channel model and channel estimation algorithm outperform existing approaches and that the DRL-enhanced 6DMA system significantly surpasses flexible antenna systems.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
Just Go Parallel: Improving the Multilingual Capabilities of Large Language Models
Authors:
Muhammad Reza Qorib,
Junyi Li,
Hwee Tou Ng
Abstract:
Large language models (LLMs) have demonstrated impressive translation capabilities even without being explicitly trained on parallel data. This remarkable property has led some to believe that parallel data is no longer necessary for building multilingual language models. While some attribute this to the emergent abilities of LLMs due to scale, recent work suggests that it is actually caused by in…
▽ More
Large language models (LLMs) have demonstrated impressive translation capabilities even without being explicitly trained on parallel data. This remarkable property has led some to believe that parallel data is no longer necessary for building multilingual language models. While some attribute this to the emergent abilities of LLMs due to scale, recent work suggests that it is actually caused by incidental bilingual signals present in the training data. Various methods have been proposed to maximize the utility of parallel data to enhance the multilingual capabilities of multilingual encoder-based and encoder-decoder language models. However, some decoder-based LLMs opt to ignore parallel data instead. In this work, we conduct a systematic study on the impact of adding parallel data on LLMs' multilingual capabilities, focusing specifically on translation and multilingual common-sense reasoning. Through controlled experiments, we demonstrate that parallel data can significantly improve LLMs' multilingual capabilities.
△ Less
Submitted 15 June, 2025;
originally announced June 2025.
-
GenFT: A Generative Parameter-Efficient Fine-Tuning Method for Pretrained Foundation Models
Authors:
Baoquan Zhang,
Guangning Xu,
Michael. K. Ng
Abstract:
Pretrained Foundation Models (PFMs) have transformed numerous applications by enabling efficient adaptation to customized tasks. Parameter-Efficient Fine-Tuning (PEFT) has emerged as a resource-efficient alternative to full fine-tuning, especially leveraging reparameterized weights $ΔW$ to adapt models for downstream tasks. However, a critical yet underexplored question remains: can we utilize wel…
▽ More
Pretrained Foundation Models (PFMs) have transformed numerous applications by enabling efficient adaptation to customized tasks. Parameter-Efficient Fine-Tuning (PEFT) has emerged as a resource-efficient alternative to full fine-tuning, especially leveraging reparameterized weights $ΔW$ to adapt models for downstream tasks. However, a critical yet underexplored question remains: can we utilize well-pretrained weights $W_0$ to guide the update of task-specific $ΔW$, avoiding inefficient training it from scratch? To end this, we propose Generative Parameter-Efficient Fine-Tuning (GenFT), a novel method that extracts structured, transferable information from $W_0$ for efficient $ΔW$ training. To extract row and column structure information, GenFT applies row and column transformations to distill essential patterns from $W_0$. A tailored policy further decomposes $ΔW$ into layer-shared and layer-specific components, balancing information reuse and individualized flexibility. GenFT is simple yet effective, achieving superior performance across CV and NLP tasks. Extensive experiments on VTAB-1K, FGVC, and GLUE benchmarks demonstrate that GenFT outperforms state-of-the-art PEFT methods, offering a new perspective for efficient model adaptation.
△ Less
Submitted 21 May, 2025;
originally announced June 2025.
-
Variational Inference Optimized Using the Curved Geometry of Coupled Free Energy
Authors:
Kenric Nelson,
Igor Oliveira,
Amenah Al-Najafi,
Fode Zhang,
Hon Keung Tony Ng
Abstract:
We introduce an optimization framework for variational inference based on the coupled free energy, extending variational inference techniques to account for the curved geometry of the coupled exponential family. This family includes important heavy-tailed distributions such as the generalized Pareto and the Student's t. By leveraging the coupled free energy, which is equal to the coupled evidence…
▽ More
We introduce an optimization framework for variational inference based on the coupled free energy, extending variational inference techniques to account for the curved geometry of the coupled exponential family. This family includes important heavy-tailed distributions such as the generalized Pareto and the Student's t. By leveraging the coupled free energy, which is equal to the coupled evidence lower bound (ELBO) of the inverted probabilities, we improve the accuracy and robustness of the learned model. The coupled generalization of Fisher Information metric and the affine connection. The method is applied to the design of a coupled variational autoencoder (CVAE). By using the coupling for both the distributions and cost functions, the reconstruction metric is derived to still be the mean-square average loss with modified constants. The novelty comes from sampling the heavy-tailed latent distribution with its associated coupled probability, which has faster decaying tails. The result is the ability to train a model robust against severe outliers, while assuring that the training process is stable. The Wasserstein-2 or Fréchet Inception Distance of the reconstructed CelebA images shows the CVAE has a 3\% improvement over the VAE after 5 epochs of training.
△ Less
Submitted 16 June, 2025; v1 submitted 10 June, 2025;
originally announced June 2025.
-
Mathesis: Towards Formal Theorem Proving from Natural Languages
Authors:
Yu Xuejun,
Jianyuan Zhong,
Zijin Feng,
Pengyi Zhai,
Roozbeh Yousefzadeh,
Wei Chong Ng,
Haoxiong Liu,
Ziyi Shou,
Jing Xiong,
Yudong Zhou,
Claudia Beth Ong,
Austen Jeremy Sugiarto,
Yaoxi Zhang,
Wai Ming Tai,
Huan Cao,
Dongcai Lu,
Jiacheng Sun,
Qiang Xu,
Shen Xin,
Zhenguo Li
Abstract:
Recent advances in large language models show strong promise for formal reasoning. However, most LLM-based theorem provers have long been constrained by the need for expert-written formal statements as inputs, limiting their applicability to real-world problems expressed in natural language. We tackle this gap with Mathesis, the first end-to-end theorem proving pipeline processing informal problem…
▽ More
Recent advances in large language models show strong promise for formal reasoning. However, most LLM-based theorem provers have long been constrained by the need for expert-written formal statements as inputs, limiting their applicability to real-world problems expressed in natural language. We tackle this gap with Mathesis, the first end-to-end theorem proving pipeline processing informal problem statements. It contributes Mathesis-Autoformalizer, the first autoformalizer using reinforcement learning to enhance the formalization ability of natural language problems, aided by our novel LeanScorer framework for nuanced formalization quality assessment. It also proposes a Mathesis-Prover, which generates formal proofs from the formalized statements. To evaluate the real-world applicability of end-to-end formal theorem proving, we introduce Gaokao-Formal, a benchmark of 488 complex problems from China's national college entrance exam. Our approach is carefully designed, with a thorough study of each component. Experiments demonstrate Mathesis's effectiveness, with the autoformalizer outperforming the best baseline by 22% in pass-rate on Gaokao-Formal. The full system surpasses other model combinations, achieving 64% accuracy on MiniF2F with pass@32 and a state-of-the-art 18% on Gaokao-Formal.
△ Less
Submitted 8 June, 2025;
originally announced June 2025.
-
LaMP-Cap: Personalized Figure Caption Generation With Multimodal Figure Profiles
Authors:
Ho Yin 'Sam' Ng,
Ting-Yao Hsu,
Aashish Anantha Ramakrishnan,
Branislav Kveton,
Nedim Lipka,
Franck Dernoncourt,
Dongwon Lee,
Tong Yu,
Sungchul Kim,
Ryan A. Rossi,
Ting-Hao 'Kenneth' Huang
Abstract:
Figure captions are crucial for helping readers understand and remember a figure's key message. Many models have been developed to generate these captions, helping authors compose better quality captions more easily. Yet, authors almost always need to revise generic AI-generated captions to match their writing style and the domain's style, highlighting the need for personalization. Despite languag…
▽ More
Figure captions are crucial for helping readers understand and remember a figure's key message. Many models have been developed to generate these captions, helping authors compose better quality captions more easily. Yet, authors almost always need to revise generic AI-generated captions to match their writing style and the domain's style, highlighting the need for personalization. Despite language models' personalization (LaMP) advances, these technologies often focus on text-only settings and rarely address scenarios where both inputs and profiles are multimodal. This paper introduces LaMP-Cap, a dataset for personalized figure caption generation with multimodal figure profiles. For each target figure, LaMP-Cap provides not only the needed inputs, such as figure images, but also up to three other figures from the same document--each with its image, caption, and figure-mentioning paragraphs--as a profile to characterize the context. Experiments with four LLMs show that using profile information consistently helps generate captions closer to the original author-written ones. Ablation studies reveal that images in the profile are more helpful than figure-mentioning paragraphs, highlighting the advantage of using multimodal profiles over text-only ones.
△ Less
Submitted 17 June, 2025; v1 submitted 6 June, 2025;
originally announced June 2025.
-
Polarized 6D Movable Antenna for Wireless Communication: Channel Modeling and Optimization
Authors:
Xiaodan Shao,
Qijun Jiang,
Derrick Wing Kwan Ng,
Naofal Al-Dhahir
Abstract:
In this paper, we propose a novel polarized six-dimensional movable antenna (P-6DMA) to enhance the performance of wireless communication cost-effectively. Specifically, the P-6DMA enables polarforming by adaptively tuning the antenna's polarization electrically as well as controls the antenna's rotation mechanically, thereby exploiting both polarization and spatial diversity to reconfigure wirele…
▽ More
In this paper, we propose a novel polarized six-dimensional movable antenna (P-6DMA) to enhance the performance of wireless communication cost-effectively. Specifically, the P-6DMA enables polarforming by adaptively tuning the antenna's polarization electrically as well as controls the antenna's rotation mechanically, thereby exploiting both polarization and spatial diversity to reconfigure wireless channels for improving communication performance. First, we model the P-6DMA channel in terms of transceiver antenna polarforming vectors and antenna rotations. We then propose a new two-timescale transmission protocol to maximize the weighted sum-rate for a P-6DMA-enhanced multiuser system. Specifically, antenna rotations at the base station (BS) are first optimized based on the statistical channel state information (CSI) of all users, which varies at a much slower rate compared to their instantaneous CSI. Then, transceiver polarforming vectors are designed to cater to the instantaneous CSI under the optimized BS antennas' rotations. Under the polarforming phase shift and amplitude constraints, a new polarforming and rotation joint design problem is efficiently addressed by a low-complexity algorithm based on penalty dual decomposition, where the polarforming coefficients are updated in parallel to reduce computational time. Simulation results demonstrate the significant performance advantages of polarforming, antenna rotation, and their joint design in comparison with various benchmarks without polarforming or antenna rotation adaptation.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
A Pre-trained Framework for Multilingual Brain Decoding Using Non-invasive Recordings
Authors:
Yi Guo,
Yihang Dong,
Michael Kwok-Po Ng,
Shuqiang Wang
Abstract:
Brain-computer interfaces (BCIs) with speech decoding from brain recordings have broad application potential in fields such as clinical rehabilitation and cognitive neuroscience. However, current decoding methods remain limited to single-language, single-subject, and single neuroimaging modality settings, restricting their clinical applicability and generalizability. Here we propose a joint multil…
▽ More
Brain-computer interfaces (BCIs) with speech decoding from brain recordings have broad application potential in fields such as clinical rehabilitation and cognitive neuroscience. However, current decoding methods remain limited to single-language, single-subject, and single neuroimaging modality settings, restricting their clinical applicability and generalizability. Here we propose a joint multilingual, multi-subject and multimodal decoding framework. It maps diverse brain recordings into a unified semantic space defined by a pre-trained multilingual model (PMM), enabling decoding across multiple languages, multiple subjects and multiple neuroimaging modalities. The proposed framework is validated using non-invasive brain recordings from 159 participants across four languages. Experimental results show that it exhibits strong generalization across multilingual, multi-subject, and multimodal settings. More importantly, the proposed framework can promote linguistic fairness, which is vital for underrepresented languages in BCI applications. The unified semantic space enables cross-lingual mapping enhancement, allowing the framework to boost the decoding performance of underrepresented languages, thereby promoting linguistic fairness. Overall, the proposed framework establishes a new potential paradigm for brain decoding, opening new paths for broader applications of BCI.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models
Authors:
Yan Gao,
Massimo Roberto Scamarcia,
Javier Fernandez-Marques,
Mohammad Naseri,
Chong Shen Ng,
Dimitris Stripelis,
Zexi Li,
Tao Shen,
Jiamu Bai,
Daoyuan Chen,
Zikai Zhang,
Rui Hu,
InSeo Song,
Lee KangYoon,
Hong Jia,
Ting Dang,
Junyan Wang,
Zheyuan Liu,
Daniel Janes Beutel,
Lingjuan Lyu,
Nicholas D. Lane
Abstract:
Large Language Models (LLMs) have achieved state-of-the-art results across diverse domains, yet their development remains reliant on vast amounts of publicly available data, raising concerns about data scarcity and the lack of access to domain-specific, sensitive information. Federated Learning (FL) presents a compelling framework to address these challenges by enabling decentralized fine-tuning o…
▽ More
Large Language Models (LLMs) have achieved state-of-the-art results across diverse domains, yet their development remains reliant on vast amounts of publicly available data, raising concerns about data scarcity and the lack of access to domain-specific, sensitive information. Federated Learning (FL) presents a compelling framework to address these challenges by enabling decentralized fine-tuning on pre-trained LLMs without sharing raw data. However, the compatibility and performance of pre-trained LLMs in FL settings remain largely under explored. We introduce the FlowerTune LLM Leaderboard, a first-of-its-kind benchmarking suite designed to evaluate federated fine-tuning of LLMs across four diverse domains: general NLP, finance, medical, and coding. Each domain includes federated instruction-tuning datasets and domain-specific evaluation metrics. Our results, obtained through a collaborative, open-source and community-driven approach, provide the first comprehensive comparison across 26 pre-trained LLMs with different aggregation and fine-tuning strategies under federated settings, offering actionable insights into model performance, resource constraints, and domain adaptation. This work lays the foundation for developing privacy-preserving, domain-specialized LLMs for real-world applications.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Rethinking Machine Unlearning in Image Generation Models
Authors:
Renyang Liu,
Wenjie Feng,
Tianwei Zhang,
Wei Zhou,
Xueqi Cheng,
See-Kiong Ng
Abstract:
With the surge and widespread application of image generation models, data privacy and content safety have become major concerns and attracted great attention from users, service providers, and policymakers. Machine unlearning (MU) is recognized as a cost-effective and promising means to address these challenges. Despite some advancements, image generation model unlearning (IGMU) still faces remar…
▽ More
With the surge and widespread application of image generation models, data privacy and content safety have become major concerns and attracted great attention from users, service providers, and policymakers. Machine unlearning (MU) is recognized as a cost-effective and promising means to address these challenges. Despite some advancements, image generation model unlearning (IGMU) still faces remarkable gaps in practice, e.g., unclear task discrimination and unlearning guidelines, lack of an effective evaluation framework, and unreliable evaluation metrics. These can hinder the understanding of unlearning mechanisms and the design of practical unlearning algorithms. We perform exhaustive assessments over existing state-of-the-art unlearning algorithms and evaluation standards, and discover several critical flaws and challenges in IGMU tasks. Driven by these limitations, we make several core contributions, to facilitate the comprehensive understanding, standardized categorization, and reliable evaluation of IGMU. Specifically, (1) We design CatIGMU, a novel hierarchical task categorization framework. It provides detailed implementation guidance for IGMU, assisting in the design of unlearning algorithms and the construction of testbeds. (2) We introduce EvalIGMU, a comprehensive evaluation framework. It includes reliable quantitative metrics across five critical aspects. (3) We construct DataIGM, a high-quality unlearning dataset, which can be used for extensive evaluations of IGMU, training content detectors for judgment, and benchmarking the state-of-the-art unlearning algorithms. With EvalIGMU and DataIGM, we discover that most existing IGMU algorithms cannot handle the unlearning well across different evaluation dimensions, especially for preservation and robustness. Code and models are available at https://github.com/ryliu68/IGMU.
△ Less
Submitted 6 June, 2025; v1 submitted 3 June, 2025;
originally announced June 2025.
-
NextQuill: Causal Preference Modeling for Enhancing LLM Personalization
Authors:
Xiaoyan Zhao,
Juntao You,
Yang Zhang,
Wenjie Wang,
Hong Cheng,
Fuli Feng,
See-Kiong Ng,
Tat-Seng Chua
Abstract:
Personalizing large language models (LLMs) for individual users has become increasingly important as they are progressively integrated into real-world applications to support users' daily lives. However, existing personalization approaches often fail to distinguish which components of model predictions and training data truly reflect user preferences, leading to superficial personalization alignme…
▽ More
Personalizing large language models (LLMs) for individual users has become increasingly important as they are progressively integrated into real-world applications to support users' daily lives. However, existing personalization approaches often fail to distinguish which components of model predictions and training data truly reflect user preferences, leading to superficial personalization alignment. In this paper, we introduce NextQuill, a novel LLM personalization alignment framework grounded in causal preference modeling. We approach personalization from a causal perspective, treating both model predictions and ground-truth data generation as outcomes influenced by user preferences, along with other factors. We define the true preference effect as the causal impact of user history (which reflects preferences) on each token prediction or data generation instance, estimated through causal intervention techniques. Building on this insight, NextQuill introduces two complementary alignment strategies: (1) aligning model-internal causal preference effects on predictions with those reflected in ground-truth data, rather than indiscriminately fitting predictions, and (2) focusing on fitting preference-bearing tokens identified via ground-truth data preference effects, rather than treating all tokens uniformly. By integrating these strategies, NextQuill shifts the alignment process toward learning from causal preference effects, facilitating more effective and personalized adaptation. Experiments across multiple personalization benchmarks demonstrate that NextQuill significantly improves personalization quality, offering a principled, causal foundation for LLM personalization. Our codes are available on https://github.com/juntaoyou/NextQuill.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
TSRating: Rating Quality of Diverse Time Series Data by Meta-learning from LLM Judgment
Authors:
Shunyu Wu,
Dan Li,
Haozheng Ye,
Zhuomin Chen,
Jiahui Zhou,
Jian Lou,
Zibin Zheng,
See-Kiong Ng
Abstract:
High-quality time series (TS) data are essential for ensuring TS model performance, rendering research on rating TS data quality indispensable. Existing methods have shown promising rating accuracy within individual domains, primarily by extending data quality rating techniques such as influence functions and Shapley values to account for temporal characteristics. However, they neglect the fact th…
▽ More
High-quality time series (TS) data are essential for ensuring TS model performance, rendering research on rating TS data quality indispensable. Existing methods have shown promising rating accuracy within individual domains, primarily by extending data quality rating techniques such as influence functions and Shapley values to account for temporal characteristics. However, they neglect the fact that real-world TS data can span vastly different domains and exhibit distinct properties, hampering the accurate and efficient rating of diverse TS data. In this paper, we propose TSRating, a novel and unified framework for rating the quality of time series data crawled from diverse domains. TSRating is built on the assumption that LLMs inherit ample knowledge, acquired during their extensive pretraining, enabling them to comprehend and discern quality differences in diverse TS data. We verify this assumption by devising a series of prompts to elicit quality comparisons from LLMs for pairs of TS samples. We then fit a dedicated rating model, termed TSRater, to convert the LLMs' judgments into efficient quality predictions via TSRater's inference on future TS samples. To ensure cross-domain adaptability, we develop a meta-learning scheme to train TSRater on quality comparisons collected from nine distinct domains. To improve training efficiency, we employ signSGD for inner-loop updates, thus circumventing the demanding computation of hypergradients. Extensive experimental results on eleven benchmark datasets across three time series tasks, each using both conventional TS models and TS foundation models, demonstrate that TSRating outperforms baselines in terms of estimation accuracy, efficiency, and domain adaptability.
△ Less
Submitted 1 June, 2025;
originally announced June 2025.
-
Söze: One Network Telemetry Is All You Need for Per-flow Weighted Bandwidth Allocation at Scale
Authors:
Weitao Wang,
T. S. Eugene Ng
Abstract:
Weighted bandwidth allocation is a powerful abstraction that has a wide range of use cases in modern data center networks. However, realizing highly agile and precise weighted bandwidth allocation for large-scale cloud environments is fundamentally challenging. In this paper, we propose Söze, a lightweight decentralized weighted bandwidth allocation system that leverages simple network telemetry f…
▽ More
Weighted bandwidth allocation is a powerful abstraction that has a wide range of use cases in modern data center networks. However, realizing highly agile and precise weighted bandwidth allocation for large-scale cloud environments is fundamentally challenging. In this paper, we propose Söze, a lightweight decentralized weighted bandwidth allocation system that leverages simple network telemetry features of commodity Ethernet switches. Given the flow weights, Söze can effectively use the telemetry information to compute and enforce the weighted bandwidth allocations without per-flow, topology, or routing knowledge. We demonstrate the effectiveness of Söze through simulations and testbed experiments, improving TPC-H jobs completion time by up to $0.59\times$ and $0.79\times$ on average.
△ Less
Submitted 5 June, 2025; v1 submitted 1 June, 2025;
originally announced June 2025.
-
Enabling Secure and Ephemeral AI Workloads in Data Mesh Environments
Authors:
Chinkit Patel,
Kee Siong Ng
Abstract:
Many large enterprises that operate highly governed and complex ICT environments have no efficient and effective way to support their Data and AI teams in rapidly spinning up and tearing down self-service data and compute infrastructure, to experiment with new data analytic tools, and deploy data products into operational use. This paper proposes a key piece of the solution to the overall problem,…
▽ More
Many large enterprises that operate highly governed and complex ICT environments have no efficient and effective way to support their Data and AI teams in rapidly spinning up and tearing down self-service data and compute infrastructure, to experiment with new data analytic tools, and deploy data products into operational use. This paper proposes a key piece of the solution to the overall problem, in the form of an on-demand self-service data-platform infrastructure to empower de-centralised data teams to build data products on top of centralised templates, policies and governance. The core innovation is an efficient method to leverage immutable container operating systems and infrastructure-as-code methodologies for creating, from scratch, vendor-neutral and short-lived Kubernetes clusters on-premises and in any cloud environment. Our proposed approach can serve as a repeatable, portable and cost-efficient alternative or complement to commercial Platform-as-a-Service (PaaS) offerings, and this is particularly important in supporting interoperability in complex data mesh environments with a mix of modern and legacy compute infrastructure.
△ Less
Submitted 30 May, 2025;
originally announced June 2025.
-
The Hallucination Dilemma: Factuality-Aware Reinforcement Learning for Large Reasoning Models
Authors:
Junyi Li,
Hwee Tou Ng
Abstract:
Large language models (LLMs) have significantly advanced in reasoning tasks through reinforcement learning (RL) optimization, achieving impressive capabilities across various challenging benchmarks. However, our empirical analysis reveals a critical drawback: reasoning-oriented RL fine-tuning significantly increases the prevalence of hallucinations. We theoretically analyze the RL training dynamic…
▽ More
Large language models (LLMs) have significantly advanced in reasoning tasks through reinforcement learning (RL) optimization, achieving impressive capabilities across various challenging benchmarks. However, our empirical analysis reveals a critical drawback: reasoning-oriented RL fine-tuning significantly increases the prevalence of hallucinations. We theoretically analyze the RL training dynamics, identifying high-variance gradient, entropy-induced randomness, and susceptibility to spurious local optima as key factors leading to hallucinations. To address this drawback, we propose Factuality-aware Step-wise Policy Optimization (FSPO), an innovative RL fine-tuning algorithm incorporating explicit factuality verification at each reasoning step. FSPO leverages automated verification against given evidence to dynamically adjust token-level advantage values, incentivizing factual correctness throughout the reasoning process. Experiments across mathematical reasoning and hallucination benchmarks using Qwen2.5 and Llama models demonstrate that FSPO effectively reduces hallucinations while enhancing reasoning accuracy, substantially improving both reliability and performance.
△ Less
Submitted 30 May, 2025;
originally announced May 2025.
-
Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization
Authors:
Mingzhe Du,
Luu Anh Tuan,
Yue Liu,
Yuhao Qing,
Dong Huang,
Xinyi He,
Qian Liu,
Zejun Ma,
See-kiong Ng
Abstract:
Large Language Models (LLMs) generate functionally correct solutions but often fall short in code efficiency, a critical bottleneck for real-world deployment. In this paper, we introduce a novel test-time iterative optimization framework to address this, employing a closed-loop system where LLMs iteratively refine code based on empirical performance feedback from an execution sandbox. We explore t…
▽ More
Large Language Models (LLMs) generate functionally correct solutions but often fall short in code efficiency, a critical bottleneck for real-world deployment. In this paper, we introduce a novel test-time iterative optimization framework to address this, employing a closed-loop system where LLMs iteratively refine code based on empirical performance feedback from an execution sandbox. We explore three training strategies: Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Group Relative Policy Optimization (GRPO). Experiments on our Venus dataset and the APPS benchmark show that SFT and DPO rapidly saturate in efficiency gains. In contrast, GRPO, using reinforcement learning (RL) with execution feedback, continuously optimizes code performance, significantly boosting both pass@1 (from 47% to 62%) and the likelihood of outperforming human submissions in efficiency (from 31% to 45%). Our work demonstrates effective test-time code efficiency improvement and critically reveals the power of RL in teaching LLMs to truly self-improve code efficiency.
△ Less
Submitted 3 June, 2025; v1 submitted 29 May, 2025;
originally announced May 2025.
-
How Does Response Length Affect Long-Form Factuality
Authors:
James Xu Zhao,
Jimmy Z. J. Liu,
Bryan Hooi,
See-Kiong Ng
Abstract:
Large language models (LLMs) are widely used for long-form text generation. However, factual errors in the responses would undermine their reliability. Despite growing attention to LLM factuality, the effect of response length on factuality remains underexplored. In this work, we systematically investigate this relationship by first introducing an automatic and bi-level long-form factuality evalua…
▽ More
Large language models (LLMs) are widely used for long-form text generation. However, factual errors in the responses would undermine their reliability. Despite growing attention to LLM factuality, the effect of response length on factuality remains underexplored. In this work, we systematically investigate this relationship by first introducing an automatic and bi-level long-form factuality evaluation framework, which achieves high agreement with human annotations while being cost-effective. Using this framework, we conduct controlled experiments and find that longer responses exhibit lower factual precision, confirming the presence of length bias. To explain this phenomenon, we empirically examine three hypotheses: error propagation, long context, and facts exhaustion. Our results reveal that facts exhaustion, where the model gradually exhausts more reliable knowledge, is the primary cause of factual degradation, rather than the other two hypotheses.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
MermaidFlow: Redefining Agentic Workflow Generation via Safety-Constrained Evolutionary Programming
Authors:
Chengqi Zheng,
Jianda Chen,
Yueming Lyu,
Wen Zheng Terence Ng,
Haopeng Zhang,
Yew-Soon Ong,
Ivor Tsang,
Haiyan Yin
Abstract:
Despite the promise of autonomous agentic reasoning, existing workflow generation methods frequently produce fragile, unexecutable plans due to unconstrained LLM-driven construction. We introduce MermaidFlow, a framework that redefines the agentic search space through safety-constrained graph evolution. At its core, MermaidFlow represent workflows as a verifiable intermediate representation using…
▽ More
Despite the promise of autonomous agentic reasoning, existing workflow generation methods frequently produce fragile, unexecutable plans due to unconstrained LLM-driven construction. We introduce MermaidFlow, a framework that redefines the agentic search space through safety-constrained graph evolution. At its core, MermaidFlow represent workflows as a verifiable intermediate representation using Mermaid, a structured and human-interpretable graph language. We formulate domain-aware evolutionary operators, i.e., crossover, mutation, insertion, and deletion, to preserve semantic correctness while promoting structural diversity, enabling efficient exploration of a high-quality, statically verifiable workflow space. Without modifying task settings or evaluation protocols, MermaidFlow achieves consistent improvements in success rates and faster convergence to executable plans on the agent reasoning benchmark. The experimental results demonstrate that safety-constrained graph evolution offers a scalable, modular foundation for robust and interpretable agentic reasoning systems.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
ConnectomeDiffuser: Generative AI Enables Brain Network Construction from Diffusion Tensor Imaging
Authors:
Xuhang Chen,
Michael Kwok-Po Ng,
Kim-Fung Tsang,
Chi-Man Pun,
Shuqiang Wang
Abstract:
Brain network analysis plays a crucial role in diagnosing and monitoring neurodegenerative disorders such as Alzheimer's disease (AD). Existing approaches for constructing structural brain networks from diffusion tensor imaging (DTI) often rely on specialized toolkits that suffer from inherent limitations: operator subjectivity, labor-intensive workflows, and restricted capacity to capture complex…
▽ More
Brain network analysis plays a crucial role in diagnosing and monitoring neurodegenerative disorders such as Alzheimer's disease (AD). Existing approaches for constructing structural brain networks from diffusion tensor imaging (DTI) often rely on specialized toolkits that suffer from inherent limitations: operator subjectivity, labor-intensive workflows, and restricted capacity to capture complex topological features and disease-specific biomarkers. To overcome these challenges and advance computational neuroimaging instrumentation, ConnectomeDiffuser is proposed as a novel diffusion-based framework for automated end-to-end brain network construction from DTI. The proposed model combines three key components: (1) a Template Network that extracts topological features from 3D DTI scans using Riemannian geometric principles, (2) a diffusion model that generates comprehensive brain networks with enhanced topological fidelity, and (3) a Graph Convolutional Network classifier that incorporates disease-specific markers to improve diagnostic accuracy. ConnectomeDiffuser demonstrates superior performance by capturing a broader range of structural connectivity and pathology-related information, enabling more sensitive analysis of individual variations in brain networks. Experimental validation on datasets representing two distinct neurodegenerative conditions demonstrates significant performance improvements over other brain network methods. This work contributes to the advancement of instrumentation in the context of neurological disorders, providing clinicians and researchers with a robust, generalizable measurement framework that facilitates more accurate diagnosis, deeper mechanistic understanding, and improved therapeutic monitoring of neurodegenerative diseases such as AD.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment
Authors:
Xiaoqiang Lin,
Arun Verma,
Zhongxiang Dai,
Daniela Rus,
See-Kiong Ng,
Bryan Kian Hsiang Low
Abstract:
The recent success of using human preferences to align large language models (LLMs) has significantly improved their performance in various downstream tasks like question answering, mathematical reasoning, and code generation. However,3 achieving effective LLM alignment depends on high-quality human preference datasets. Collecting these datasets requires human preference annotation, which is costl…
▽ More
The recent success of using human preferences to align large language models (LLMs) has significantly improved their performance in various downstream tasks like question answering, mathematical reasoning, and code generation. However,3 achieving effective LLM alignment depends on high-quality human preference datasets. Collecting these datasets requires human preference annotation, which is costly and resource-intensive, necessitating efficient active data selection methods. Existing methods either lack a strong theoretical foundation or depend on restrictive reward function assumptions (e.g., linearity). To this end, we propose an algorithm, ActiveDPO, that uses a theoretically grounded data selection criterion for non-linear reward functions while directly leveraging the LLM itself to parameterize the reward model that is used for active data selection. As a result, ActiveDPO explicitly accounts for the influence of LLM on data selection, unlike methods that select the data without considering the LLM that is being aligned, thereby leading to more effective and efficient data collection. Extensive experiments show that ActiveDPO outperforms existing methods across various models and datasets.
△ Less
Submitted 25 May, 2025;
originally announced May 2025.
-
L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models
Authors:
Xiaohao Liu,
Xiaobo Xia,
Weixiang Zhao,
Manyi Zhang,
Xianzhi Yu,
Xiu Su,
Shuo Yang,
See-Kiong Ng,
Tat-Seng Chua
Abstract:
Large language models (LLMs) have achieved notable progress. Despite their success, next-token prediction (NTP), the dominant method for LLM training and inference, is constrained in both contextual coverage and inference efficiency due to its inherently sequential process. To overcome these challenges, we propose leap multi-token prediction~(L-MTP), an innovative token prediction method that exte…
▽ More
Large language models (LLMs) have achieved notable progress. Despite their success, next-token prediction (NTP), the dominant method for LLM training and inference, is constrained in both contextual coverage and inference efficiency due to its inherently sequential process. To overcome these challenges, we propose leap multi-token prediction~(L-MTP), an innovative token prediction method that extends the capabilities of multi-token prediction (MTP) by introducing a leap-based mechanism. Unlike conventional MTP, which generates multiple tokens at adjacent positions, L-MTP strategically skips over intermediate tokens, predicting non-sequential ones in a single forward pass. This structured leap not only enhances the model's ability to capture long-range dependencies but also enables a decoding strategy specially optimized for non-sequential leap token generation, effectively accelerating inference. We theoretically demonstrate the benefit of L-MTP in improving inference efficiency. Experiments across diverse benchmarks validate its merit in boosting both LLM performance and inference speed. The source code will be publicly available.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
CaseReportBench: An LLM Benchmark Dataset for Dense Information Extraction in Clinical Case Reports
Authors:
Xiao Yu Cindy Zhang,
Carlos R. Ferreira,
Francis Rossignol,
Raymond T. Ng,
Wyeth Wasserman,
Jian Zhu
Abstract:
Rare diseases, including Inborn Errors of Metabolism (IEM), pose significant diagnostic challenges. Case reports serve as key but computationally underutilized resources to inform diagnosis. Clinical dense information extraction refers to organizing medical information into structured predefined categories. Large Language Models (LLMs) may enable scalable information extraction from case reports b…
▽ More
Rare diseases, including Inborn Errors of Metabolism (IEM), pose significant diagnostic challenges. Case reports serve as key but computationally underutilized resources to inform diagnosis. Clinical dense information extraction refers to organizing medical information into structured predefined categories. Large Language Models (LLMs) may enable scalable information extraction from case reports but are rarely evaluated for this task. We introduce CaseReportBench, an expert-annotated dataset for dense information extraction of case reports, focusing on IEMs. Using this dataset, we assess various models and prompting strategies, introducing novel approaches such as category-specific prompting and subheading-filtered data integration. Zero-shot chain-of-thought prompting offers little advantage over standard zero-shot prompting. Category-specific prompting improves alignment with the benchmark. The open-source model Qwen2.5-7B outperforms GPT-4o for this task. Our clinician evaluations show that LLMs can extract clinically relevant details from case reports, supporting rare disease diagnosis and management. We also highlight areas for improvement, such as LLMs' limitations in recognizing negative findings important for differential diagnosis. This work advances LLM-driven clinical natural language processing and paves the way for scalable medical AI applications.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
EndoVLA: Dual-Phase Vision-Language-Action Model for Autonomous Tracking in Endoscopy
Authors:
Chi Kit Ng,
Long Bai,
Guankun Wang,
Yupeng Wang,
Huxin Gao,
Kun Yuan,
Chenhan Jin,
Tieyong Zeng,
Hongliang Ren
Abstract:
In endoscopic procedures, autonomous tracking of abnormal regions and following circumferential cutting markers can significantly reduce the cognitive burden on endoscopists. However, conventional model-based pipelines are fragile for each component (e.g., detection, motion planning) requires manual tuning and struggles to incorporate high-level endoscopic intent, leading to poor generalization ac…
▽ More
In endoscopic procedures, autonomous tracking of abnormal regions and following circumferential cutting markers can significantly reduce the cognitive burden on endoscopists. However, conventional model-based pipelines are fragile for each component (e.g., detection, motion planning) requires manual tuning and struggles to incorporate high-level endoscopic intent, leading to poor generalization across diverse scenes. Vision-Language-Action (VLA) models, which integrate visual perception, language grounding, and motion planning within an end-to-end framework, offer a promising alternative by semantically adapting to surgeon prompts without manual recalibration. Despite their potential, applying VLA models to robotic endoscopy presents unique challenges due to the complex and dynamic anatomical environments of the gastrointestinal (GI) tract. To address this, we introduce EndoVLA, designed specifically for continuum robots in GI interventions. Given endoscopic images and surgeon-issued tracking prompts, EndoVLA performs three core tasks: (1) polyp tracking, (2) delineation and following of abnormal mucosal regions, and (3) adherence to circular markers during circumferential cutting. To tackle data scarcity and domain shifts, we propose a dual-phase strategy comprising supervised fine-tuning on our EndoVLA-Motion dataset and reinforcement fine-tuning with task-aware rewards. Our approach significantly improves tracking performance in endoscopy and enables zero-shot generalization in diverse scenes and complex sequential tasks.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
NExT-Search: Rebuilding User Feedback Ecosystem for Generative AI Search
Authors:
Sunhao Dai,
Wenjie Wang,
Liang Pang,
Jun Xu,
See-Kiong Ng,
Ji-Rong Wen,
Tat-Seng Chua
Abstract:
Generative AI search is reshaping information retrieval by offering end-to-end answers to complex queries, reducing users' reliance on manually browsing and summarizing multiple web pages. However, while this paradigm enhances convenience, it disrupts the feedback-driven improvement loop that has historically powered the evolution of traditional Web search. Web search can continuously improve thei…
▽ More
Generative AI search is reshaping information retrieval by offering end-to-end answers to complex queries, reducing users' reliance on manually browsing and summarizing multiple web pages. However, while this paradigm enhances convenience, it disrupts the feedback-driven improvement loop that has historically powered the evolution of traditional Web search. Web search can continuously improve their ranking models by collecting large-scale, fine-grained user feedback (e.g., clicks, dwell time) at the document level. In contrast, generative AI search operates through a much longer search pipeline, spanning query decomposition, document retrieval, and answer generation, yet typically receives only coarse-grained feedback on the final answer. This introduces a feedback loop disconnect, where user feedback for the final output cannot be effectively mapped back to specific system components, making it difficult to improve each intermediate stage and sustain the feedback loop. In this paper, we envision NExT-Search, a next-generation paradigm designed to reintroduce fine-grained, process-level feedback into generative AI search. NExT-Search integrates two complementary modes: User Debug Mode, which allows engaged users to intervene at key stages; and Shadow User Mode, where a personalized user agent simulates user preferences and provides AI-assisted feedback for less interactive users. Furthermore, we envision how these feedback signals can be leveraged through online adaptation, which refines current search outputs in real-time, and offline update, which aggregates interaction logs to periodically fine-tune query decomposition, retrieval, and generation models. By restoring human control over key stages of the generative AI search pipeline, we believe NExT-Search offers a promising direction for building feedback-rich AI search systems that can evolve continuously alongside human feedback.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
Investigating and Enhancing the Robustness of Large Multimodal Models Against Temporal Inconsistency
Authors:
Jiafeng Liang,
Shixin Jiang,
Xuan Dong,
Ning Wang,
Zheng Chu,
Hui Su,
Jinlan Fu,
Ming Liu,
See-Kiong Ng,
Bing Qin
Abstract:
Large Multimodal Models (LMMs) have recently demonstrated impressive performance on general video comprehension benchmarks. Nevertheless, for broader applications, the robustness of their temporal analysis capability needs to be thoroughly investigated yet predominantly ignored. Motivated by this, we propose a novel temporal robustness benchmark (TemRobBench), which introduces temporal inconsisten…
▽ More
Large Multimodal Models (LMMs) have recently demonstrated impressive performance on general video comprehension benchmarks. Nevertheless, for broader applications, the robustness of their temporal analysis capability needs to be thoroughly investigated yet predominantly ignored. Motivated by this, we propose a novel temporal robustness benchmark (TemRobBench), which introduces temporal inconsistency perturbations separately at the visual and textual modalities to assess the robustness of models. We evaluate 16 mainstream LMMs and find that they exhibit over-reliance on prior knowledge and textual context in adversarial environments, while ignoring the actual temporal dynamics in the video. To mitigate this issue, we design panoramic direct preference optimization (PanoDPO), which encourages LMMs to incorporate both visual and linguistic feature preferences simultaneously. Experimental results show that PanoDPO can effectively enhance the model's robustness and reliability in temporal analysis.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
DSMentor: Enhancing Data Science Agents with Curriculum Learning and Online Knowledge Accumulation
Authors:
He Wang,
Alexander Hanbo Li,
Yiqun Hu,
Sheng Zhang,
Hideo Kobayashi,
Jiani Zhang,
Henry Zhu,
Chung-Wei Hang,
Patrick Ng
Abstract:
Large language model (LLM) agents have shown promising performance in generating code for solving complex data science problems. Recent studies primarily focus on enhancing in-context learning through improved search, sampling, and planning techniques, while overlooking the importance of the order in which problems are tackled during inference. In this work, we develop a novel inference-time optim…
▽ More
Large language model (LLM) agents have shown promising performance in generating code for solving complex data science problems. Recent studies primarily focus on enhancing in-context learning through improved search, sampling, and planning techniques, while overlooking the importance of the order in which problems are tackled during inference. In this work, we develop a novel inference-time optimization framework, referred to as DSMentor, which leverages curriculum learning -- a strategy that introduces simpler task first and progressively moves to more complex ones as the learner improves -- to enhance LLM agent performance in challenging data science tasks. Our mentor-guided framework organizes data science tasks in order of increasing difficulty and incorporates a growing long-term memory to retain prior experiences, guiding the agent's learning progression and enabling more effective utilization of accumulated knowledge. We evaluate DSMentor through extensive experiments on DSEval and QRData benchmarks. Experiments show that DSMentor using Claude-3.5-Sonnet improves the pass rate by up to 5.2% on DSEval and QRData compared to baseline agents. Furthermore, DSMentor demonstrates stronger causal reasoning ability, improving the pass rate by 8.8% on the causality problems compared to GPT-4 using Program-of-Thoughts prompts. Our work underscores the importance of developing effective strategies for accumulating and utilizing knowledge during inference, mirroring the human learning process and opening new avenues for improving LLM performance through curriculum-based inference optimization.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
6G communications through sub-Terahertz CMOS power amplifiers: Design challenges and trends
Authors:
Jun Yan Lee,
Duo Wu,
Xuanrui Guo,
Jian Ding Tan,
Teh Jia Yew,
Zi Neng Ng,
Mohammad Arif Sobhan Bhuiyan,
Mahdi H. Miraz
Abstract:
The fifth-generation (5G) network faces limitations in supporting emerging applications, such as artificial intelligence (AI), virtual reality (VR) and digital twins. To overcome these confines, sub-Terahertz (sub-THz) and Terahertz (THz) technologies are considered to be key enablers of effective 6G wireless communications, offering higher transmission speeds, longer range and wider bandwidth. Ac…
▽ More
The fifth-generation (5G) network faces limitations in supporting emerging applications, such as artificial intelligence (AI), virtual reality (VR) and digital twins. To overcome these confines, sub-Terahertz (sub-THz) and Terahertz (THz) technologies are considered to be key enablers of effective 6G wireless communications, offering higher transmission speeds, longer range and wider bandwidth. Achieving these capabilities requires careful engineering of 6G transceivers, with a focus on efficient power amplifiers (PAs) in the front-end, which play a critical role in effectively amplifying and transmitting signals over long distances. Complimentary metal-oxidesemiconductor (CMOS) technology-based PA in sub-THz suffers severe parasitic and limited maximum frequency, however, this has eventually been solved by different design architectures and scaling down of CMOS technology to break through the frequency limitations. In this article, we reviewed the potentials and capabilities of CMOS technology for designing 6G hardware, identified the state-of-art PA designs in the sub-THz band and then examined as well as compared the designs to identify the suitable design strategies for better performance. The circuit optimisation techniques, such as coupled-line, passive gain boosting method, zero-degree power splitting, load-pull matching, diode and capacitor linearisation for better gain, saturated output power and power added efficiency, are considered for the PA design architectures at different sub-THz bands. Furthermore, these methods are summarised and discussed with their advantages and disadvantages in lieu with their performances. The PA design trends, challenges and future perspectives are also presented and discussed. Therefore, this comprehensive review article will serve as a comparative study and reference for future PA designs for radio frequency integrated circuits (RFIC).
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code
Authors:
Yuhao Qing,
Boyu Zhu,
Mingzhe Du,
Zhijiang Guo,
Terry Yue Zhuo,
Qianru Zhang,
Jie M. Zhang,
Heming Cui,
Siu-Ming Yiu,
Dong Huang,
See-Kiong Ng,
Luu Anh Tuan
Abstract:
Existing code generation benchmarks primarily evaluate functional correctness, with limited focus on code efficiency and often restricted to a single language like Python. To address this gap, we introduce EffiBench-X, the first multi-language benchmark designed to measure the efficiency of LLM-generated code. EffiBench-X supports Python, C++, Java, JavaScript, Ruby, and Golang. It comprises compe…
▽ More
Existing code generation benchmarks primarily evaluate functional correctness, with limited focus on code efficiency and often restricted to a single language like Python. To address this gap, we introduce EffiBench-X, the first multi-language benchmark designed to measure the efficiency of LLM-generated code. EffiBench-X supports Python, C++, Java, JavaScript, Ruby, and Golang. It comprises competitive programming tasks with human-expert solutions as efficiency baselines. Evaluating state-of-the-art LLMs on EffiBench-X reveals that while models generate functionally correct code, they consistently underperform human experts in efficiency. Even the most efficient LLM-generated solutions (Qwen3-32B) achieve only around \textbf{62\%} of human efficiency on average, with significant language-specific variations. LLMs show better efficiency in Python, Ruby, and JavaScript than in Java, C++, and Golang. For instance, DeepSeek-R1's Python code is significantly more efficient than its Java code. These results highlight the critical need for research into LLM optimization techniques to improve code efficiency across diverse languages. The dataset and evaluation infrastructure are submitted and available at https://github.com/EffiBench/EffiBench-X.git and https://huggingface.co/datasets/EffiBench/effibench-x.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
Temporal-Oriented Recipe for Transferring Large Vision-Language Model to Video Understanding
Authors:
Thong Nguyen,
Zhiyuan Hu,
Xu Lin,
Cong-Duy Nguyen,
See-Kiong Ng,
Luu Anh Tuan
Abstract:
Recent years have witnessed outstanding advances of large vision-language models (LVLMs). In order to tackle video understanding, most of them depend upon their implicit temporal understanding capacity. As such, they have not deciphered important components that contribute to temporal understanding ability, which might limit the potential of these LVLMs for video understanding. In this work, we co…
▽ More
Recent years have witnessed outstanding advances of large vision-language models (LVLMs). In order to tackle video understanding, most of them depend upon their implicit temporal understanding capacity. As such, they have not deciphered important components that contribute to temporal understanding ability, which might limit the potential of these LVLMs for video understanding. In this work, we conduct a thorough empirical study to demystify crucial components that influence the temporal understanding of LVLMs. Our empirical study reveals that significant impacts are centered around the intermediate interface between the visual encoder and the large language model. Building on these insights, we propose a temporal-oriented recipe that encompasses temporal-oriented training schemes and an upscaled interface. Our final model developed using our recipe significantly enhances previous LVLMs on standard video understanding tasks.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
Convergence Rates of Constrained Expected Improvement
Authors:
Haowei Wang,
Jingyi Wang,
Zhongxiang Dai,
Nai-Yuan Chiang,
Szu Hui Ng,
Cosmin G. Petra
Abstract:
Constrained Bayesian optimization (CBO) methods have seen significant success in black-box optimization with constraints, and one of the most commonly used CBO methods is the constrained expected improvement (CEI) algorithm. CEI is a natural extension of the expected improvement (EI) when constraints are incorporated. However, the theoretical convergence rate of CEI has not been established. In th…
▽ More
Constrained Bayesian optimization (CBO) methods have seen significant success in black-box optimization with constraints, and one of the most commonly used CBO methods is the constrained expected improvement (CEI) algorithm. CEI is a natural extension of the expected improvement (EI) when constraints are incorporated. However, the theoretical convergence rate of CEI has not been established. In this work, we study the convergence rate of CEI by analyzing its simple regret upper bound. First, we show that when the objective function $f$ and constraint function $c$ are assumed to each lie in a reproducing kernel Hilbert space (RKHS), CEI achieves the convergence rates of $\mathcal{O} \left(t^{-\frac{1}{2}}\log^{\frac{d+1}{2}}(t) \right) \ \text{and }\ \mathcal{O}\left(t^{\frac{-ν}{2ν+d}} \log^{\fracν{2ν+d}}(t)\right)$ for the commonly used squared exponential and Matérn kernels, respectively. Second, we show that when $f$ and $c$ are assumed to be sampled from Gaussian processes (GPs), CEI achieves the same convergence rates with a high probability. Numerical experiments are performed to validate the theoretical analysis.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
One-Stage Top-$k$ Learning-to-Defer: Score-Based Surrogates with Theoretical Guarantees
Authors:
Yannis Montreuil,
Axel Carlier,
Lai Xing Ng,
Wei Tsang Ooi
Abstract:
We introduce the first one-stage Top-$k$ Learning-to-Defer framework, which unifies prediction and deferral by learning a shared score-based model that selects the $k$ most cost-effective entities-labels or experts-per input. While existing one-stage L2D methods are limited to deferring to a single expert, our approach jointly optimizes prediction and deferral across multiple entities through a si…
▽ More
We introduce the first one-stage Top-$k$ Learning-to-Defer framework, which unifies prediction and deferral by learning a shared score-based model that selects the $k$ most cost-effective entities-labels or experts-per input. While existing one-stage L2D methods are limited to deferring to a single expert, our approach jointly optimizes prediction and deferral across multiple entities through a single end-to-end objective. We define a cost-sensitive loss and derive a novel convex surrogate that is independent of the cardinality parameter $k$, enabling generalization across Top-$k$ regimes without retraining. Our formulation recovers the Top-1 deferral policy of prior score-based methods as a special case, and we prove that our surrogate is both Bayes-consistent and $\mathcal{H}$-consistent under mild assumptions. We further introduce an adaptive variant, Top-$k(x)$, which dynamically selects the number of consulted entities per input to balance predictive accuracy and consultation cost. Experiments on CIFAR-10 and SVHN confirm that our one-stage Top-$k$ method strictly outperforms Top-1 deferral, while Top-$k(x)$ achieves superior accuracy-cost trade-offs by tailoring allocations to input complexity.
△ Less
Submitted 15 May, 2025;
originally announced May 2025.
-
Rethinking Prompt Optimizers: From Prompt Merits to Optimization
Authors:
Zixiao Zhu,
Hanzhang Zhou,
Zijian Feng,
Tianjiao Li,
Chua Jia Jim Deryl,
Mak Lee Onn,
Gee Wah Ng,
Kezhi Mao
Abstract:
Prompt optimization (PO) provides a practical way to improve response quality when users lack the time or expertise to manually craft effective prompts. Existing methods typically rely on advanced, large-scale LLMs like GPT-4 to generate optimized prompts. However, due to limited downward compatibility, verbose, instruction-heavy prompts from advanced LLMs can overwhelm lightweight inference model…
▽ More
Prompt optimization (PO) provides a practical way to improve response quality when users lack the time or expertise to manually craft effective prompts. Existing methods typically rely on advanced, large-scale LLMs like GPT-4 to generate optimized prompts. However, due to limited downward compatibility, verbose, instruction-heavy prompts from advanced LLMs can overwhelm lightweight inference models and degrade response quality. In this work, we rethink prompt optimization through the lens of interpretable design. We first identify a set of model-agnostic prompt quality merits and empirically validate their effectiveness in enhancing prompt and response quality. We then introduce MePO, a merit-guided, lightweight, and locally deployable prompt optimizer trained on our preference dataset built from merit-aligned prompts generated by a lightweight LLM. Unlike prior work, MePO avoids online optimization reliance, reduces cost and privacy concerns, and, by learning clear, interpretable merits, generalizes effectively to both large-scale and lightweight inference models. Experiments demonstrate that MePO achieves better results across diverse tasks and model types, offering a scalable and robust solution for real-world deployment. The code and dataset can be found in https://github.com/MidiyaZhu/MePO
△ Less
Submitted 20 May, 2025; v1 submitted 14 May, 2025;
originally announced May 2025.
-
Architecture of Tianyu Software: Relative Photometry as a Case Study
Authors:
Yicheng Rui,
Yifan Xuan,
Shuyue Zheng,
Kexin Li,
Kaiming Cui,
Kai Xiao,
Jie Zheng,
Jun Kai Ng,
Hongxuan Jiang,
Fabo Feng,
Qinghui Sun
Abstract:
Tianyu telescope, an one-meter robotic optical survey instrument to be constructed in Lenghu, Qinghai, China, is designed for detecting transiting exoplanets, variable stars and transients. It requires a highly automated, optimally distributed, easily extendable, and highly flexible software to enable the data processing for the raw data at rates exceeding 500MB/s. In this work, we introduce the a…
▽ More
Tianyu telescope, an one-meter robotic optical survey instrument to be constructed in Lenghu, Qinghai, China, is designed for detecting transiting exoplanets, variable stars and transients. It requires a highly automated, optimally distributed, easily extendable, and highly flexible software to enable the data processing for the raw data at rates exceeding 500MB/s. In this work, we introduce the architecture of the Tianyu pipeline and use relative photometry as a case to demonstrate its high scalability and efficiency. This pipeline is tested on the data collected from Muguang observatory and Xinglong observatory. The pipeline demonstrates high scalability, with most processing stages increasing in throughput as the number of consumers grows. Compared to a single consumer, the median throughput of image calibration, alignment, and flux extraction increases by 41%, 257%, and 107% respectively when using 5 consumers, while image stacking exhibits limited scalability due to I/O constraints. In our tests, the pipeline was able to detect two transiting sources. Besides, the pipeline captures variability in the light curves of nine known and two previously unknown variable sources in the testing data. Meanwhile, the differential photometric precision of the light curves is near the theoretical limitation. These results indicate that this pipeline is suitable for detecting transiting exoplanets and variable stars. This work builds the fundation for further development of Tianyu software. Code of this work is available at https://github.com/ruiyicheng/Tianyu_pipeline.
△ Less
Submitted 14 May, 2025; v1 submitted 13 May, 2025;
originally announced May 2025.
-
Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding
Authors:
Dianwen Ng,
Kun Zhou,
Yi-Wen Chao,
Zhiwei Xiong,
Bin Ma,
Eng Siong Chng
Abstract:
Achieving high-fidelity audio compression while preserving perceptual quality across diverse content remains a key challenge in Neural Audio Coding (NAC). We introduce MUFFIN, a fully convolutional Neural Psychoacoustic Coding (NPC) framework that leverages psychoacoustically guided multi-band frequency reconstruction. At its core is a Multi-Band Spectral Residual Vector Quantization (MBS-RVQ) mod…
▽ More
Achieving high-fidelity audio compression while preserving perceptual quality across diverse content remains a key challenge in Neural Audio Coding (NAC). We introduce MUFFIN, a fully convolutional Neural Psychoacoustic Coding (NPC) framework that leverages psychoacoustically guided multi-band frequency reconstruction. At its core is a Multi-Band Spectral Residual Vector Quantization (MBS-RVQ) module that allocates bitrate across frequency bands based on perceptual salience. This design enables efficient compression while disentangling speaker identity from content using distinct codebooks. MUFFIN incorporates a transformer-inspired convolutional backbone and a modified snake activation to enhance resolution in fine-grained spectral regions. Experimental results on multiple benchmarks demonstrate that MUFFIN consistently outperforms existing approaches in reconstruction quality. A high-compression variant achieves a state-of-the-art 12.5 Hz rate with minimal loss. MUFFIN also proves effective in downstream generative tasks, highlighting its promise as a token representation for integration with language models. Audio samples and code are available.
△ Less
Submitted 12 May, 2025;
originally announced May 2025.
-
Near-Field Channel Estimation for XL-MIMO: A Deep Generative Model Guided by Side Information
Authors:
Zhenzhou Jin,
Li You,
Derrick Wing Kwan Ng,
Xiang-Gen Xia,
Xiqi Gao
Abstract:
This paper investigates the near-field (NF) channel estimation (CE) for extremely large-scale multiple-input multiple-output (XL-MIMO) systems. Considering the pronounced NF effects in XL-MIMO communications, we first establish a joint angle-distance (AD) domain-based spherical-wavefront physical channel model that captures the inherent sparsity of XL-MIMO channels. Leveraging the channel's sparsi…
▽ More
This paper investigates the near-field (NF) channel estimation (CE) for extremely large-scale multiple-input multiple-output (XL-MIMO) systems. Considering the pronounced NF effects in XL-MIMO communications, we first establish a joint angle-distance (AD) domain-based spherical-wavefront physical channel model that captures the inherent sparsity of XL-MIMO channels. Leveraging the channel's sparsity in the joint AD domain, the CE is approached as a task of reconstructing sparse signals. Anchored in this framework, we first propose a compressed sensing algorithm to acquire a preliminary channel estimate. Harnessing the powerful implicit prior learning capability of generative artificial intelligence (GenAI), we further propose a GenAI-based approach to refine the estimated channel. Specifically, we introduce the preliminary estimated channel as side information, and derive the evidence lower bound (ELBO) of the log-marginal distribution of the target NF channel conditioned on the preliminary estimated channel, which serves as the optimization objective for the proposed generative diffusion model (GDM). Additionally, we introduce a more generalized version of the GDM, the non-Markovian GDM (NM-GDM), to accelerate the sampling process, achieving an approximately tenfold enhancement in sampling efficiency. Experimental results indicate that the proposed approach is capable of offering substantial performance gain in CE compared to existing benchmark schemes within NF XL-MIMO systems. Furthermore, our approach exhibits enhanced generalization capabilities in both the NF or far-field (FF) regions.
△ Less
Submitted 11 May, 2025;
originally announced May 2025.
-
SpectrumFM: A Foundation Model for Intelligent Spectrum Management
Authors:
Fuhui Zhou,
Chunyu Liu,
Hao Zhang,
Wei Wu,
Qihui Wu,
Derrick Wing Kwan Ng,
Tony Q. S. Quek,
Chan-Byoung Chae
Abstract:
Intelligent spectrum management is crucial for improving spectrum efficiency and achieving secure utilization of spectrum resources. However, existing intelligent spectrum management methods, typically based on small-scale models, suffer from notable limitations in recognition accuracy, convergence speed, and generalization, particularly in the complex and dynamic spectrum environments. To address…
▽ More
Intelligent spectrum management is crucial for improving spectrum efficiency and achieving secure utilization of spectrum resources. However, existing intelligent spectrum management methods, typically based on small-scale models, suffer from notable limitations in recognition accuracy, convergence speed, and generalization, particularly in the complex and dynamic spectrum environments. To address these challenges, this paper proposes a novel spectrum foundation model, termed SpectrumFM, establishing a new paradigm for spectrum management. SpectrumFM features an innovative encoder architecture that synergistically exploits the convolutional neural networks and the multi-head self-attention mechanisms to enhance feature extraction and enable robust representation learning. The model is pre-trained via two novel self-supervised learning tasks, namely masked reconstruction and next-slot signal prediction, which leverage large-scale in-phase and quadrature (IQ) data to achieve comprehensive and transferable spectrum representations. Furthermore, a parameter-efficient fine-tuning strategy is proposed to enable SpectrumFM to adapt to various downstream spectrum management tasks, including automatic modulation classification (AMC), wireless technology classification (WTC), spectrum sensing (SS), and anomaly detection (AD). Extensive experiments demonstrate that SpectrumFM achieves superior performance in terms of accuracy, robustness, adaptability, few-shot learning efficiency, and convergence speed, consistently outperforming conventional methods across multiple benchmarks. Specifically, SpectrumFM improves AMC accuracy by up to 12.1% and WTC accuracy by 9.3%, achieves an area under the curve (AUC) of 0.97 in SS at -4 dB signal-to-noise ratio (SNR), and enhances AD performance by over 10%.
△ Less
Submitted 2 May, 2025;
originally announced May 2025.
-
QualBench: Benchmarking Chinese LLMs with Localized Professional Qualifications for Vertical Domain Evaluation
Authors:
Mengze Hong,
Wailing Ng,
Di Jiang,
Chen Jason Zhang
Abstract:
The rapid advancement of Chinese large language models (LLMs) underscores the need for domain-specific evaluations to ensure reliable applications. However, existing benchmarks often lack coverage in vertical domains and offer limited insights into the Chinese working context. Leveraging qualification exams as a unified framework for human expertise evaluation, we introduce QualBench, the first mu…
▽ More
The rapid advancement of Chinese large language models (LLMs) underscores the need for domain-specific evaluations to ensure reliable applications. However, existing benchmarks often lack coverage in vertical domains and offer limited insights into the Chinese working context. Leveraging qualification exams as a unified framework for human expertise evaluation, we introduce QualBench, the first multi-domain Chinese QA benchmark dedicated to localized assessment of Chinese LLMs. The dataset includes over 17,000 questions across six vertical domains, with data selections grounded in 24 Chinese qualifications to closely align with national policies and working standards. Through comprehensive evaluation, the Qwen2.5 model outperformed the more advanced GPT-4o, with Chinese LLMs consistently surpassing non-Chinese models, highlighting the importance of localized domain knowledge in meeting qualification requirements. The best performance of 75.26% reveals the current gaps in domain coverage within model capabilities. Furthermore, we present the failure of LLM collaboration with crowdsourcing mechanisms and suggest the opportunities for multi-domain RAG knowledge enhancement and vertical domain LLM training with Federated Learning.
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
WaterDrum: Watermarking for Data-centric Unlearning Metric
Authors:
Xinyang Lu,
Xinyuan Niu,
Gregory Kang Ruey Lau,
Bui Thi Cam Nhung,
Rachael Hwee Ling Sim,
Fanyu Wen,
Chuan-Sheng Foo,
See-Kiong Ng,
Bryan Kian Hsiang Low
Abstract:
Large language model (LLM) unlearning is critical in real-world applications where it is necessary to efficiently remove the influence of private, copyrighted, or harmful data from some users. However, existing utility-centric unlearning metrics (based on model utility) may fail to accurately evaluate the extent of unlearning in realistic settings such as when (a) the forget and retain set have se…
▽ More
Large language model (LLM) unlearning is critical in real-world applications where it is necessary to efficiently remove the influence of private, copyrighted, or harmful data from some users. However, existing utility-centric unlearning metrics (based on model utility) may fail to accurately evaluate the extent of unlearning in realistic settings such as when (a) the forget and retain set have semantically similar content, (b) retraining the model from scratch on the retain set is impractical, and/or (c) the model owner can improve the unlearning metric without directly performing unlearning on the LLM. This paper presents the first data-centric unlearning metric for LLMs called WaterDrum that exploits robust text watermarking for overcoming these limitations. We also introduce new benchmark datasets for LLM unlearning that contain varying levels of similar data points and can be used to rigorously evaluate unlearning algorithms using WaterDrum. Our code is available at https://github.com/lululu008/WaterDrum and our new benchmark datasets are released at https://huggingface.co/datasets/Glow-AI/WaterDrum-Ax.
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
Dynamic Precoding for Near-Field Secure Communications: Implementation and Performance Analysis
Authors:
Zihao Teng,
Jiancheng An,
Christos Masouros,
Hongbin Li,
Lu Gan,
Derrick Wing Kwan Ng
Abstract:
The increase in antenna apertures and transmission frequencies in next-generation wireless networks is catalyzing advancements in near-field communications (NFC). In this paper, we investigate secure transmission in near-field multi-user multiple-input single-output (MU-MISO) scenarios. Specifically, with the advent of extremely large-scale antenna arrays (ELAA) applied in the NFC regime, the spat…
▽ More
The increase in antenna apertures and transmission frequencies in next-generation wireless networks is catalyzing advancements in near-field communications (NFC). In this paper, we investigate secure transmission in near-field multi-user multiple-input single-output (MU-MISO) scenarios. Specifically, with the advent of extremely large-scale antenna arrays (ELAA) applied in the NFC regime, the spatial degrees of freedom in the channel matrix are significantly enhanced. This creates an expanded null space that can be exploited for designing secure communication schemes. Motivated by this observation, we propose a near-field dynamic hybrid beamforming architecture incorporating artificial noise, which effectively disrupts eavesdroppers at any undesired positions, even in the absence of their channel state information (CSI). Furthermore, we comprehensively analyze the dynamic precoder's performance in terms of the average signal-to-interference-plus-noise ratio, achievable rate, secrecy capacity, secrecy outage probability, and the size of the secrecy zone. In contrast to far-field secure transmission techniques that only enhance security in the angular dimension, the proposed algorithm exploits the unique properties of spherical wave characteristics in NFC to achieve secure transmission in both the angular and distance dimensions. Remarkably, the proposed algorithm is applicable to arbitrary modulation types and array configurations. Numerical results demonstrate that the proposed method achieves approximately 20\% higher rate capacity compared to zero-forcing and the weighted minimum mean squared error precoders.
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
Appeal and Scope of Misinformation Spread by AI Agents and Humans
Authors:
Lynnette Hui Xian Ng,
Wenqi Zhou,
Kathleen M. Carley
Abstract:
This work examines the influence of misinformation and the role of AI agents, called bots, on social network platforms. To quantify the impact of misinformation, it proposes two new metrics based on attributes of tweet engagement and user network position: Appeal, which measures the popularity of the tweet, and Scope, which measures the potential reach of the tweet. In addition, it analyzes 5.8 mi…
▽ More
This work examines the influence of misinformation and the role of AI agents, called bots, on social network platforms. To quantify the impact of misinformation, it proposes two new metrics based on attributes of tweet engagement and user network position: Appeal, which measures the popularity of the tweet, and Scope, which measures the potential reach of the tweet. In addition, it analyzes 5.8 million misinformation tweets on the COVID-19 vaccine discourse over three time periods: Pre-Vaccine, Vaccine Launch, and Post-Vaccine. Results show that misinformation was more prevalent during the first two periods. Human-generated misinformation tweets tend to have higher appeal and scope compared to bot-generated ones. Tweedie regression analysis reveals that human-generated misinformation tweets were most concerning during Vaccine Launch week, whereas bot-generated misinformation reached its highest appeal and scope during the Pre-Vaccine period.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
Troika algorithm: approximate optimization for accurate clique partitioning and clustering of weighted networks
Authors:
Samin Aref,
Boris Ng
Abstract:
Clique partitioning is a fundamental network clustering task, with applications in a wide range of computational sciences. It involves identifying an optimal partition of the nodes for a real-valued weighted graph according to the edge weights. An optimal partition is one that maximizes the sum of within-cluster edge weights over all possible node partitions. This paper introduces a novel approxim…
▽ More
Clique partitioning is a fundamental network clustering task, with applications in a wide range of computational sciences. It involves identifying an optimal partition of the nodes for a real-valued weighted graph according to the edge weights. An optimal partition is one that maximizes the sum of within-cluster edge weights over all possible node partitions. This paper introduces a novel approximation algorithm named Troika to solve this NP-hard problem in small to mid-sized networks for instances of theoretical and practical relevance. Troika uses a branch-and-cut scheme for branching on node triples to find a partition that is within a user-specified optimality gap tolerance. Troika offers advantages over alternative methods like integer programming solvers and heuristics for clique partitioning. Unlike existing heuristics, Troika returns solutions within a guaranteed proximity to global optimality. And our results indicate that Troika is faster than using the state-of-the-art integer programming solver Gurobi for most benchmark instances. Besides its advantages for solving the clique partitioning problem, we demonstrate the applications of Troika in community detection and portfolio analysis. Troika returns partitions with higher proximity to optimal compared to eight modularity-based community detection algorithms. When used on networks of correlations among stocks, Troika reveals the dynamic changes in the structure of portfolio networks including downturns from the 2008 financial crisis and the reaction to the COVID-19 pandemic. Our comprehensive results based on benchmarks from the literature and new real and random networks point to Troika as a reliable and accurate method for solving clique partitioning instances with up to 5000 edges on standard hardware.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models
Authors:
Chong Zhang,
Yue Deng,
Xiang Lin,
Bin Wang,
Dianwen Ng,
Hai Ye,
Xingxuan Li,
Yao Xiao,
Zhanfeng Mo,
Qi Zhang,
Lidong Bing
Abstract:
The recent development of reasoning language models (RLMs) represents a novel evolution in large language models. In particular, the recent release of DeepSeek-R1 has generated widespread social impact and sparked enthusiasm in the research community for exploring the explicit reasoning paradigm of language models. However, the implementation details of the released models have not been fully open…
▽ More
The recent development of reasoning language models (RLMs) represents a novel evolution in large language models. In particular, the recent release of DeepSeek-R1 has generated widespread social impact and sparked enthusiasm in the research community for exploring the explicit reasoning paradigm of language models. However, the implementation details of the released models have not been fully open-sourced by DeepSeek, including DeepSeek-R1-Zero, DeepSeek-R1, and the distilled small models. As a result, many replication studies have emerged aiming to reproduce the strong performance achieved by DeepSeek-R1, reaching comparable performance through similar training procedures and fully open-source data resources. These works have investigated feasible strategies for supervised fine-tuning (SFT) and reinforcement learning from verifiable rewards (RLVR), focusing on data preparation and method design, yielding various valuable insights. In this report, we provide a summary of recent replication studies to inspire future research. We primarily focus on SFT and RLVR as two main directions, introducing the details for data construction, method design and training procedure of current replication studies. Moreover, we conclude key findings from the implementation details and experimental results reported by these studies, anticipating to inspire future research. We also discuss additional techniques of enhancing RLMs, highlighting the potential of expanding the application scope of these models, and discussing the challenges in development. By this survey, we aim to help researchers and developers of RLMs stay updated with the latest advancements, and seek to inspire new ideas to further enhance RLMs.
△ Less
Submitted 15 May, 2025; v1 submitted 1 May, 2025;
originally announced May 2025.
-
Deep Learning Optimization Using Self-Adaptive Weighted Auxiliary Variables
Authors:
Yaru Liu,
Yiqi Gu,
Michael K. Ng
Abstract:
In this paper, we develop a new optimization framework for the least squares learning problem via fully connected neural networks or physics-informed neural networks. The gradient descent sometimes behaves inefficiently in deep learning because of the high non-convexity of loss functions and the vanishing gradient issue. Our idea is to introduce auxiliary variables to separate the layers of the de…
▽ More
In this paper, we develop a new optimization framework for the least squares learning problem via fully connected neural networks or physics-informed neural networks. The gradient descent sometimes behaves inefficiently in deep learning because of the high non-convexity of loss functions and the vanishing gradient issue. Our idea is to introduce auxiliary variables to separate the layers of the deep neural networks and reformulate the loss functions for ease of optimization. We design the self-adaptive weights to preserve the consistency between the reformulated loss and the original mean squared loss, which guarantees that optimizing the new loss helps optimize the original problem. Numerical experiments are presented to verify the consistency and show the effectiveness and robustness of our models over gradient descent.
△ Less
Submitted 30 April, 2025;
originally announced April 2025.
-
Quaternion Nuclear Norms Over Frobenius Norms Minimization for Robust Matrix Completion
Authors:
Yu Guo,
Guoqing Chen,
Tieyong Zeng,
Qiyu Jin,
Michael Kwok-Po Ng
Abstract:
Recovering hidden structures from incomplete or noisy data remains a pervasive challenge across many fields, particularly where multi-dimensional data representation is essential. Quaternion matrices, with their ability to naturally model multi-dimensional data, offer a promising framework for this problem. This paper introduces the quaternion nuclear norm over the Frobenius norm (QNOF) as a novel…
▽ More
Recovering hidden structures from incomplete or noisy data remains a pervasive challenge across many fields, particularly where multi-dimensional data representation is essential. Quaternion matrices, with their ability to naturally model multi-dimensional data, offer a promising framework for this problem. This paper introduces the quaternion nuclear norm over the Frobenius norm (QNOF) as a novel nonconvex approximation for the rank of quaternion matrices. QNOF is parameter-free and scale-invariant. Utilizing quaternion singular value decomposition, we prove that solving the QNOF can be simplified to solving the singular value $L_1/L_2$ problem. Additionally, we extend the QNOF to robust quaternion matrix completion, employing the alternating direction multiplier method to derive solutions that guarantee weak convergence under mild conditions. Extensive numerical experiments validate the proposed model's superiority, consistently outperforming state-of-the-art quaternion methods.
△ Less
Submitted 30 April, 2025;
originally announced April 2025.
-
A Unified QoS-Aware Multiplexing Framework for Next Generation Immersive Communication with Legacy Wireless Applications
Authors:
Jihong Li,
Shunqing Zhang,
Tao Yu,
Guangjin Pan,
Kaixuan Huang,
Xiaojing Chen,
Yanzan Sun,
Junyu Liu,
Jiandong Li,
Derrick Wing Kwan Ng
Abstract:
Immersive communication, including emerging augmented reality, virtual reality, and holographic telepresence, has been identified as a key service for enabling next-generation wireless applications. To align with legacy wireless applications, such as enhanced mobile broadband or ultra-reliable low-latency communication, network slicing has been widely adopted. However, attempting to statistically…
▽ More
Immersive communication, including emerging augmented reality, virtual reality, and holographic telepresence, has been identified as a key service for enabling next-generation wireless applications. To align with legacy wireless applications, such as enhanced mobile broadband or ultra-reliable low-latency communication, network slicing has been widely adopted. However, attempting to statistically isolate the above types of wireless applications through different network slices may lead to throughput degradation and increased queue backlog. To address these challenges, we establish a unified QoS-aware framework that supports immersive communication and legacy wireless applications simultaneously. Based on the Lyapunov drift theorem, we transform the original long-term throughput maximization problem into an equivalent short-term throughput maximization weighted by virtual queue length. Moreover, to cope with the challenges introduced by the interaction between large-timescale network slicing and short-timescale resource allocation, we propose an adaptive adversarial slicing (Ad2S) scheme for networks with invarying channel statistics. To track the network channel variations, we also propose a measurement extrapolation-Kalman filter (ME-KF)-based method and refine our scheme into Ad2S-non-stationary refinement (Ad2S-NR). Through extended numerical examples, we demonstrate that our proposed schemes achieve 3.86 Mbps throughput improvement and 63.96% latency reduction with 24.36% convergence time reduction. Within our framework, the trade-off between total throughput and user service experience can be achieved by tuning systematic parameters.
△ Less
Submitted 2 May, 2025; v1 submitted 30 April, 2025;
originally announced April 2025.
-
Small or Large? Zero-Shot or Finetuned? Guiding Language Model Choice for Specialized Applications in Healthcare
Authors:
Lovedeep Gondara,
Jonathan Simkin,
Graham Sayle,
Shebnum Devji,
Gregory Arbour,
Raymond Ng
Abstract:
This study aims to guide language model selection by investigating: 1) the necessity of finetuning versus zero-shot usage, 2) the benefits of domain-adjacent versus generic pretrained models, 3) the value of further domain-specific pretraining, and 4) the continued relevance of Small Language Models (SLMs) compared to Large Language Models (LLMs) for specific tasks. Using electronic pathology repo…
▽ More
This study aims to guide language model selection by investigating: 1) the necessity of finetuning versus zero-shot usage, 2) the benefits of domain-adjacent versus generic pretrained models, 3) the value of further domain-specific pretraining, and 4) the continued relevance of Small Language Models (SLMs) compared to Large Language Models (LLMs) for specific tasks. Using electronic pathology reports from the British Columbia Cancer Registry (BCCR), three classification scenarios with varying difficulty and data size are evaluated. Models include various SLMs and an LLM. SLMs are evaluated both zero-shot and finetuned; the LLM is evaluated zero-shot only. Finetuning significantly improved SLM performance across all scenarios compared to their zero-shot results. The zero-shot LLM outperformed zero-shot SLMs but was consistently outperformed by finetuned SLMs. Domain-adjacent SLMs generally performed better than the generic SLM after finetuning, especially on harder tasks. Further domain-specific pretraining yielded modest gains on easier tasks but significant improvements on the complex, data-scarce task. The results highlight the critical role of finetuning for SLMs in specialized domains, enabling them to surpass zero-shot LLM performance on targeted classification tasks. Pretraining on domain-adjacent or domain-specific data provides further advantages, particularly for complex problems or limited finetuning data. While LLMs offer strong zero-shot capabilities, their performance on these specific tasks did not match that of appropriately finetuned SLMs. In the era of LLMs, SLMs remain relevant and effective, offering a potentially superior performance-resource trade-off compared to LLMs.
△ Less
Submitted 29 April, 2025;
originally announced April 2025.
-
DDPS: Discrete Diffusion Posterior Sampling for Paths in Layered Graphs
Authors:
Hao Luan,
See-Kiong Ng,
Chun Kai Ling
Abstract:
Diffusion models form an important class of generative models today, accounting for much of the state of the art in cutting edge AI research. While numerous extensions beyond image and video generation exist, few of such approaches address the issue of explicit constraints in the samples generated. In this paper, we study the problem of generating paths in a layered graph (a variant of a directed…
▽ More
Diffusion models form an important class of generative models today, accounting for much of the state of the art in cutting edge AI research. While numerous extensions beyond image and video generation exist, few of such approaches address the issue of explicit constraints in the samples generated. In this paper, we study the problem of generating paths in a layered graph (a variant of a directed acyclic graph) using discrete diffusion models, while guaranteeing that our generated samples are indeed paths. Our approach utilizes a simple yet effective representation for paths which we call the padded adjacency-list matrix (PALM). In addition, we show how to effectively perform classifier guidance, which helps steer the sampled paths to specific preferred edges without any retraining of the diffusion model. Our preliminary results show that empirically, our method outperforms alternatives which do not explicitly account for path constraints.
△ Less
Submitted 29 April, 2025;
originally announced April 2025.
-
Opportunistic Collaborative Planning with Large Vision Model Guided Control and Joint Query-Service Optimization
Authors:
Jiayi Chen,
Shuai Wang,
Guoliang Li,
Wei Xu,
Guangxu Zhu,
Derrick Wing Kwan Ng,
Chengzhong Xu
Abstract:
Navigating autonomous vehicles in open scenarios is a challenge due to the difficulties in handling unseen objects. Existing solutions either rely on small models that struggle with generalization or large models that are resource-intensive. While collaboration between the two offers a promising solution, the key challenge is deciding when and how to engage the large model. To address this issue,…
▽ More
Navigating autonomous vehicles in open scenarios is a challenge due to the difficulties in handling unseen objects. Existing solutions either rely on small models that struggle with generalization or large models that are resource-intensive. While collaboration between the two offers a promising solution, the key challenge is deciding when and how to engage the large model. To address this issue, this paper proposes opportunistic collaborative planning (OCP), which seamlessly integrates efficient local models with powerful cloud models through two key innovations. First, we propose large vision model guided model predictive control (LVM-MPC), which leverages the cloud for LVM perception and decision making. The cloud output serves as a global guidance for a local MPC, thereby forming a closed-loop perception-to-control system. Second, to determine the best timing for large model query and service, we propose collaboration timing optimization (CTO), including object detection confidence thresholding (ODCT) and cloud forward simulation (CFS), to decide when to seek cloud assistance and when to offer cloud service. Extensive experiments show that the proposed OCP outperforms existing methods in terms of both navigation time and success rate.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
Proof of Useful Intelligence (PoUI): Blockchain Consensus Beyond Energy Waste
Authors:
Zan-Kai Chong,
Hiroyuki Ohsaki,
Bryan Ng
Abstract:
Blockchain technology enables secure, transparent data management in decentralized systems, supporting applications from cryptocurrencies like Bitcoin to tokenizing real-world assets like property. Its scalability and sustainability hinge on consensus mechanisms balancing security and efficiency. Proof of Work (PoW), used by Bitcoin, ensures security through energy-intensive computations but deman…
▽ More
Blockchain technology enables secure, transparent data management in decentralized systems, supporting applications from cryptocurrencies like Bitcoin to tokenizing real-world assets like property. Its scalability and sustainability hinge on consensus mechanisms balancing security and efficiency. Proof of Work (PoW), used by Bitcoin, ensures security through energy-intensive computations but demands significant resources. Proof of Stake (PoS), as in Ethereum post-Merge, selects validators based on staked cryptocurrency, offering energy efficiency but risking centralization from wealth concentration. With AI models straining computational resources, we propose Proof of Useful Intelligence (PoUI), a hybrid consensus mechanism. In PoUI, workers perform AI tasks like language processing or image analysis to earn coins, which are staked to secure the network, blending security with practical utility. Decentralized nodes--job posters, market coordinators, workers, and validators --collaborate via smart contracts to manage tasks and rewards.
△ Less
Submitted 24 April, 2025;
originally announced April 2025.
-
Guiding VLM Agents with Process Rewards at Inference Time for GUI Navigation
Authors:
Zhiyuan Hu,
Shiyun Xiong,
Yifan Zhang,
See-Kiong Ng,
Anh Tuan Luu,
Bo An,
Shuicheng Yan,
Bryan Hooi
Abstract:
Recent advancements in visual language models (VLMs) have notably enhanced their capabilities in handling complex Graphical User Interface (GUI) interaction tasks. Despite these improvements, current frameworks often struggle to generate correct actions in challenging GUI environments. State-of-the-art commercial VLMs are black-boxes, and fine-tuning open-source VLMs for GUI tasks requires signifi…
▽ More
Recent advancements in visual language models (VLMs) have notably enhanced their capabilities in handling complex Graphical User Interface (GUI) interaction tasks. Despite these improvements, current frameworks often struggle to generate correct actions in challenging GUI environments. State-of-the-art commercial VLMs are black-boxes, and fine-tuning open-source VLMs for GUI tasks requires significant resources. Additionally, existing trajectory-level evaluation and refinement techniques frequently fall short due to delayed feedback and local optimization issues. To address these challenges, we propose an approach that guides VLM agents with process supervision by a reward model during GUI navigation and control at inference time. This guidance allows the VLM agent to optimize actions at each inference step, thereby improving performance in both static and dynamic environments. In particular, our method demonstrates significant performance gains in three GUI navigation tasks, achieving a 3.4% improvement in single step action accuracy for static environments, along with a around 33% increase in task success rate in one dynamic environment. With further integration of trajectory reflection and retry mechanisms, we also demonstrate even greater enhancement in task success.
△ Less
Submitted 22 April, 2025;
originally announced April 2025.