Skip to main content

Showing 1–50 of 262 results for author: Tan, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.06756  [pdf, other

    stat.ML cs.LG stat.CO

    Out-of-Sample Embedding with Proximity Data: Projection versus Restricted Reconstruction

    Authors: Michael W. Trosset, Kaiyi Tan, Minh Tang, Carey E. Priebe

    Abstract: The problem of using proximity (similarity or dissimilarity) data for the purpose of "adding a point to a vector diagram" was first studied by J.C. Gower in 1968. Since then, a number of methods -- mostly kernel methods -- have been proposed for solving what has come to be called the problem of *out-of-sample embedding*. We survey the various kernel methods that we have encountered and show that e… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: 19 pages, 2 figures

  2. arXiv:2505.04089  [pdf

    cs.NE

    A New Scope and Domain Measure Comparison Method for Global Convergence Analysis in Evolutionary Computation

    Authors: Liu-Yue Luo, Zhi-Hui Zhan, Kay Chen Tan, Jun Zhang

    Abstract: Convergence analysis is a fundamental research topic in evolutionary computation (EC). The commonly used analysis method models the EC algorithm as a homogeneous Markov chain for analysis, which is not always suitable for different EC variants, and also sometimes causes misuse and confusion due to their complex process. In this article, we categorize the existing researches on convergence analysis… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 14 pages, 8 figures

  3. arXiv:2505.03710  [pdf, other

    stat.ML cs.AI cs.LG

    Actor-Critics Can Achieve Optimal Sample Efficiency

    Authors: Kevin Tan, Wei Fan, Yuting Wei

    Abstract: Actor-critic algorithms have become a cornerstone in reinforcement learning (RL), leveraging the strengths of both policy-based and value-based methods. Despite recent progress in understanding their statistical efficiency, no existing work has successfully learned an $ε$-optimal policy with a sample complexity of $O(1/ε^2)$ trajectories with general function approximation when strategic explorati… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: Accepted to ICML 2025

  4. arXiv:2505.02335  [pdf, ps, other

    cs.CV

    6D Pose Estimation on Spoons and Hands

    Authors: Kevin Tan, Fan Yang, Yuhao Chen

    Abstract: Accurate dietary monitoring is essential for promoting healthier eating habits. A key area of research is how people interact and consume food using utensils and hands. By tracking their position and orientation, it is possible to estimate the volume of food being consumed, or monitor eating behaviours, highly useful insights into nutritional intake that can be more reliable than popular methods s… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

  5. arXiv:2504.21463  [pdf, other

    cs.CL

    RWKV-X: A Linear Complexity Hybrid Language Model

    Authors: Haowen Hou, Zhiyi Huang, Kaifeng Tan, Rongchang Lu, Fei Richard Yu

    Abstract: In this paper, we introduce RWKV-X, a novel hybrid architecture that combines the efficiency of RWKV for short-range modeling with a sparse attention mechanism designed to capture long-range context. Unlike previous hybrid approaches that rely on full attention layers and retain quadratic complexity, RWKV-X achieves linear-time complexity in training and constant-time complexity in inference decod… ▽ More

    Submitted 8 May, 2025; v1 submitted 30 April, 2025; originally announced April 2025.

    Comments: 12 pages, typos corrected

  6. arXiv:2504.17966  [pdf, other

    cs.RO cs.LG

    Plug-and-Play Physics-informed Learning using Uncertainty Quantified Port-Hamiltonian Models

    Authors: Kaiyuan Tan, Peilun Li, Jun Wang, Thomas Beckers

    Abstract: The ability to predict trajectories of surrounding agents and obstacles is a crucial component in many robotic applications. Data-driven approaches are commonly adopted for state prediction in scenarios where the underlying dynamics are unknown. However, the performance, reliability, and uncertainty of data-driven predictors become compromised when encountering out-of-distribution observations rel… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: 7 pages, 6 figures

  7. arXiv:2504.12334  [pdf, other

    cs.CL

    QM-ToT: A Medical Tree of Thoughts Reasoning Framework for Quantized Model

    Authors: Zongxian Yang, Jiayu Qian, Zhi-An Huang, Kay Chen Tan

    Abstract: Large language models (LLMs) face significant challenges in specialized biomedical tasks due to the inherent complexity of medical reasoning and the sensitive nature of clinical data. Existing LLMs often struggle with intricate medical terminology and the need for accurate clinical insights, leading to performance reduction when quantized for resource-constrained deployment. To address these issue… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

    Comments: 8 pages

  8. arXiv:2503.21839  [pdf, other

    cs.CV cs.AI cs.LG

    M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization?

    Authors: Haolong Yan, Kaijun Tan, Yeqing Shen, Xin Huang, Zheng Ge, Xiangyu Zhang, Si Li, Daxin Jiang

    Abstract: We investigate a critical yet under-explored question in Large Vision-Language Models (LVLMs): Do LVLMs genuinely comprehend interleaved image-text in the document? Existing document understanding benchmarks often assess LVLMs using question-answer formats, which are information-sparse and difficult to guarantee the coverage of long-range dependencies. To address this issue, we introduce a novel a… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

  9. arXiv:2503.21156  [pdf, other

    cs.NE

    A Theoretical Analysis of Analogy-Based Evolutionary Transfer Optimization

    Authors: Xiaoming Xue, Liang Feng, Yinglan Feng, Rui Liu, Kai Zhang, Kay Chen Tan

    Abstract: Evolutionary transfer optimization (ETO) has been gaining popularity in research over the years due to its outstanding knowledge transfer ability to address various challenges in optimization. However, a pressing issue in this field is that the invention of new ETO algorithms has far outpaced the development of fundamental theories needed to clearly understand the key factors contributing to the s… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

  10. arXiv:2503.14456  [pdf, other

    cs.CL cs.AI cs.LG

    RWKV-7 "Goose" with Expressive Dynamic State Evolution

    Authors: Bo Peng, Ruichong Zhang, Daniel Goldstein, Eric Alcaide, Xingjian Du, Haowen Hou, Jiaju Lin, Jiaxing Liu, Janna Lu, William Merrill, Guangyu Song, Kaifeng Tan, Saiteja Utpala, Nathan Wilce, Johan S. Wind, Tianyi Wu, Daniel Wuttke, Christian Zhou-Zheng

    Abstract: We present RWKV-7 "Goose", a new sequence modeling architecture with constant memory usage and constant inference time per token. Despite being trained on dramatically fewer tokens than other top models, our 2.9 billion parameter language model achieves a new 3B SoTA on multilingual tasks and matches the current 3B SoTA on English language downstream performance. RWKV-7 introduces a newly generali… ▽ More

    Submitted 30 March, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

    ACM Class: I.2.0; I.2.7

  11. arXiv:2503.11251  [pdf, other

    cs.CV cs.CL

    Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model

    Authors: Haoyang Huang, Guoqing Ma, Nan Duan, Xing Chen, Changyi Wan, Ranchen Ming, Tianyu Wang, Bo Wang, Zhiying Lu, Aojie Li, Xianfang Zeng, Xinhao Zhang, Gang Yu, Yuhe Yin, Qiling Wu, Wen Sun, Kang An, Xin Han, Deshan Sun, Wei Ji, Bizhu Huang, Brian Li, Chenfei Wu, Guanzhe Huang, Huixin Xiong , et al. (29 additional authors not shown)

    Abstract: We present Step-Video-TI2V, a state-of-the-art text-driven image-to-video generation model with 30B parameters, capable of generating videos up to 102 frames based on both text and image inputs. We build Step-Video-TI2V-Eval as a new benchmark for the text-driven image-to-video task and compare Step-Video-TI2V with open-source and commercial TI2V engines using this dataset. Experimental results de… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: 7 pages

  12. arXiv:2502.10248  [pdf, other

    cs.CV cs.CL

    Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

    Authors: Guoqing Ma, Haoyang Huang, Kun Yan, Liangyu Chen, Nan Duan, Shengming Yin, Changyi Wan, Ranchen Ming, Xiaoniu Song, Xing Chen, Yu Zhou, Deshan Sun, Deyu Zhou, Jian Zhou, Kaijun Tan, Kang An, Mei Chen, Wei Ji, Qiling Wu, Wen Sun, Xin Han, Yanan Wei, Zheng Ge, Aojie Li, Bin Wang , et al. (90 additional authors not shown)

    Abstract: We present Step-Video-T2V, a state-of-the-art text-to-video pre-trained model with 30B parameters and the ability to generate videos up to 204 frames in length. A deep compression Variational Autoencoder, Video-VAE, is designed for video generation tasks, achieving 16x16 spatial and 8x temporal compression ratios, while maintaining exceptional video reconstruction quality. User prompts are encoded… ▽ More

    Submitted 24 February, 2025; v1 submitted 14 February, 2025; originally announced February 2025.

    Comments: 36 pages, 14 figures

  13. arXiv:2502.09449  [pdf, other

    cs.NE

    Spiking Neural Networks for Temporal Processing: Status Quo and Future Prospects

    Authors: Chenxiang Ma, Xinyi Chen, Yanchen Li, Qu Yang, Yujie Wu, Guoqi Li, Gang Pan, Huajin Tang, Kay Chen Tan, Jibin Wu

    Abstract: Temporal processing is fundamental for both biological and artificial intelligence systems, as it enables the comprehension of dynamic environments and facilitates timely responses. Spiking Neural Networks (SNNs) excel in handling such data with high efficiency, owing to their rich neuronal dynamics and sparse activity patterns. Given the recent surge in the development of SNNs, there is an urgent… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

  14. arXiv:2502.07794  [pdf

    cs.CY cs.AI

    Regulatory Science Innovation for Generative AI and Large Language Models in Health and Medicine: A Global Call for Action

    Authors: Jasmine Chiat Ling Ong, Yilin Ning, Mingxuan Liu, Yian Ma, Zhao Liang, Kuldev Singh, Robert T Chang, Silke Vogel, John CW Lim, Iris Siu Kwan Tan, Oscar Freyer, Stephen Gilbert, Danielle S Bitterman, Xiaoxuan Liu, Alastair K Denniston, Nan Liu

    Abstract: The integration of generative AI (GenAI) and large language models (LLMs) in healthcare presents both unprecedented opportunities and challenges, necessitating innovative regulatory approaches. GenAI and LLMs offer broad applications, from automating clinical workflows to personalizing diagnostics. However, the non-deterministic outputs, broad functionalities and complex integration of GenAI and L… ▽ More

    Submitted 27 January, 2025; originally announced February 2025.

  15. arXiv:2502.02630  [pdf

    q-bio.QM cs.AI cs.LG

    scBIT: Integrating Single-cell Transcriptomic Data into fMRI-based Prediction for Alzheimer's Disease Diagnosis

    Authors: Yu-An Huang, Yao Hu, Yue-Chao Li, Xiyue Cao, Xinyuan Li, Kay Chen Tan, Zhu-Hong You, Zhi-An Huang

    Abstract: Functional MRI (fMRI) and single-cell transcriptomics are pivotal in Alzheimer's disease (AD) research, each providing unique insights into neural function and molecular mechanisms. However, integrating these complementary modalities remains largely unexplored. Here, we introduce scBIT, a novel method for enhancing AD prediction by combining fMRI with single-nucleus RNA (snRNA). scBIT leverages sn… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

    Comments: 31 pages, 5 figures

  16. arXiv:2502.01692  [pdf, other

    cs.LG cs.AI

    Fast Direct: Query-Efficient Online Black-box Guidance for Diffusion-model Target Generation

    Authors: Kim Yong Tan, Yueming Lyu, Ivor Tsang, Yew-Soon Ong

    Abstract: Guided diffusion-model generation is a promising direction for customizing the generation process of a pre-trained diffusion model to address specific downstream tasks. Existing guided diffusion models either rely on training the guidance model with pre-collected datasets or require the objective functions to be differentiable. However, for most real-world tasks, offline datasets are often unavail… ▽ More

    Submitted 29 March, 2025; v1 submitted 2 February, 2025; originally announced February 2025.

  17. arXiv:2501.18157  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Efficient Audiovisual Speech Processing via MUTUD: Multimodal Training and Unimodal Deployment

    Authors: Joanna Hong, Sanjeel Parekh, Honglie Chen, Jacob Donley, Ke Tan, Buye Xu, Anurag Kumar

    Abstract: Building reliable speech systems often requires combining multiple modalities, like audio and visual cues. While such multimodal solutions frequently lead to improvements in performance and may even be critical in certain cases, they come with several constraints such as increased sensory requirements, computational cost, and modality synchronization, to mention a few. These challenges constrain t… ▽ More

    Submitted 30 January, 2025; originally announced January 2025.

  18. arXiv:2501.15129  [pdf, other

    cs.NE

    EvoRL: A GPU-accelerated Framework for Evolutionary Reinforcement Learning

    Authors: Bowen Zheng, Ran Cheng, Kay Chen Tan

    Abstract: Evolutionary Reinforcement Learning (EvoRL) has emerged as a promising approach to overcoming the limitations of traditional reinforcement learning (RL) by integrating the Evolutionary Computation (EC) paradigm with RL. However, the population-based nature of EC significantly increases computational costs, thereby restricting the exploration of algorithmic design choices and scalability in large-s… ▽ More

    Submitted 2 February, 2025; v1 submitted 25 January, 2025; originally announced January 2025.

  19. arXiv:2501.08165  [pdf, other

    cs.SE cs.AI

    I Can Find You in Seconds! Leveraging Large Language Models for Code Authorship Attribution

    Authors: Soohyeon Choi, Yong Kiam Tan, Mark Huasong Meng, Mohamed Ragab, Soumik Mondal, David Mohaisen, Khin Mi Mi Aung

    Abstract: Source code authorship attribution is important in software forensics, plagiarism detection, and protecting software patch integrity. Existing techniques often rely on supervised machine learning, which struggles with generalization across different programming languages and coding styles due to the need for large labeled datasets. Inspired by recent advances in natural language authorship analysi… ▽ More

    Submitted 14 January, 2025; originally announced January 2025.

    Comments: 12 pages, 5 figures,

  20. arXiv:2501.02857  [pdf, other

    cs.NE cs.HC cs.LG

    ParetoLens: A Visual Analytics Framework for Exploring Solution Sets of Multi-objective Evolutionary Algorithms

    Authors: Yuxin Ma, Zherui Zhang, Ran Cheng, Yaochu Jin, Kay Chen Tan

    Abstract: In the domain of multi-objective optimization, evolutionary algorithms are distinguished by their capability to generate a diverse population of solutions that navigate the trade-offs inherent among competing objectives. This has catalyzed the ascension of evolutionary multi-objective optimization (EMO) as a prevalent approach. Despite the effectiveness of the EMO paradigm, the analysis of resulta… ▽ More

    Submitted 6 January, 2025; originally announced January 2025.

    Comments: Accepted by IEEE Computational Intelligence Magazine

  21. arXiv:2412.20177  [pdf, other

    cs.CV cs.DB

    Mining Platoon Patterns from Traffic Videos

    Authors: Yijun Bei, Teng Ma, Dongxiang Zhang, Sai Wu, Kian-Lee Tan, Gang Chen

    Abstract: Discovering co-movement patterns from urban-scale video data sources has emerged as an attractive topic. This task aims to identify groups of objects that travel together along a common route, which offers effective support for government agencies in enhancing smart city management. However, the previous work has made a strong assumption on the accuracy of recovered trajectories from videos and th… ▽ More

    Submitted 1 January, 2025; v1 submitted 28 December, 2024; originally announced December 2024.

  22. arXiv:2412.11538  [pdf, other

    cs.CL cs.AI eess.AS

    MERaLiON-SpeechEncoder: Towards a Speech Foundation Model for Singapore and Beyond

    Authors: Muhammad Huzaifah, Geyu Lin, Tianchi Liu, Hardik B. Sailor, Kye Min Tan, Tarun K. Vangani, Qiongqiong Wang, Jeremy H. M. Wong, Nancy F. Chen, Ai Ti Aw

    Abstract: This technical report describes the MERaLiON-SpeechEncoder, a foundation model designed to support a wide range of downstream speech applications. Developed as part of Singapore's National Multimodal Large Language Model Programme, the MERaLiON-SpeechEncoder is tailored to address the speech processing needs in Singapore and the surrounding Southeast Asian region. The model currently supports main… ▽ More

    Submitted 20 December, 2024; v1 submitted 16 December, 2024; originally announced December 2024.

  23. PhishIntel: Toward Practical Deployment of Reference-Based Phishing Detection

    Authors: Yuexin Li, Hiok Kuek Tan, Qiaoran Meng, Mei Lin Lock, Tri Cao, Shumin Deng, Nay Oo, Hoon Wei Lim, Bryan Hooi

    Abstract: Phishing is a critical cyber threat, exploiting deceptive tactics to compromise victims and cause significant financial losses. While reference-based phishing detectors (RBPDs) have achieved notable advancements in detection accuracy, their real-world deployment is hindered by challenges such as high latency and inefficiency in URL analysis. To address these limitations, we present PhishIntel, an… ▽ More

    Submitted 14 February, 2025; v1 submitted 12 December, 2024; originally announced December 2024.

    Comments: Accepted by WWW 2025 (Demo Track)

  24. arXiv:2412.03701  [pdf

    cs.LG

    Interpretable Hierarchical Attention Network for Medical Condition Identification

    Authors: Dongping Fang, Lian Duan, Xiaojing Yuan, Allyn Klunder, Kevin Tan, Suiting Cao, Yeqing Ji, Mike Xu

    Abstract: Accurate prediction of medical conditions with straight past clinical evidence is a long-sought topic in the medical management and health insurance field. Although great progress has been made with machine learning algorithms, the medical community is still skeptical about the model accuracy and interpretability. This paper presents an innovative hierarchical attention deep learning model to achi… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

  25. arXiv:2411.14252  [pdf, other

    cs.CL cs.AI

    Intent-Aware Dialogue Generation and Multi-Task Contrastive Learning for Multi-Turn Intent Classification

    Authors: Junhua Liu, Yong Keat Tan, Bin Fu, Kwan Hui Lim

    Abstract: Generating large-scale, domain-specific, multilingual multi-turn dialogue datasets remains a significant hurdle for training effective Multi-Turn Intent Classification models in chatbot systems. In this paper, we introduce Chain-of-Intent, a novel mechanism that combines Hidden Markov Models with Large Language Models (LLMs) to generate contextually aware, intent-driven conversations through self-… ▽ More

    Submitted 21 November, 2024; originally announced November 2024.

  26. arXiv:2411.12307  [pdf, other

    cs.CL cs.AI cs.IR

    Balancing Accuracy and Efficiency in Multi-Turn Intent Classification for LLM-Powered Dialog Systems in Production

    Authors: Junhua Liu, Yong Keat Tan, Bin Fu, Kwan Hui Lim

    Abstract: Accurate multi-turn intent classification is essential for advancing conversational AI systems. However, challenges such as the scarcity of comprehensive datasets and the complexity of contextual dependencies across dialogue turns hinder progress. This paper presents two novel approaches leveraging Large Language Models (LLMs) to enhance scalability and reduce latency in production dialogue system… ▽ More

    Submitted 19 November, 2024; originally announced November 2024.

  27. arXiv:2411.06491  [pdf, other

    cs.NE

    MBL-CPDP: A Multi-objective Bilevel Method for Cross-Project Defect Prediction via Automated Machine Learning

    Authors: Jiaxin Chen, Jinliang Ding, Kay Chen Tan, Jiancheng Qian, Ke Li

    Abstract: Cross-project defect prediction (CPDP) leverages machine learning (ML) techniques to proactively identify software defects, especially where project-specific data is scarce. However, developing a robust ML pipeline with optimal hyperparameters that effectively use cross-project information and yield satisfactory performance remains challenging. In this paper, we resolve this bottleneck by formulat… ▽ More

    Submitted 10 November, 2024; originally announced November 2024.

    Comments: 37 pages

  28. arXiv:2411.02448  [pdf, other

    cs.CL cs.AI

    Rate, Explain and Cite (REC): Enhanced Explanation and Attribution in Automatic Evaluation by Large Language Models

    Authors: Aliyah R. Hsu, James Zhu, Zhichao Wang, Bin Bi, Shubham Mehrotra, Shiva K. Pentyala, Katherine Tan, Xiang-Bo Mao, Roshanak Omrani, Sougata Chaudhuri, Regunathan Radhakrishnan, Sitaram Asur, Claire Na Cheng, Bin Yu

    Abstract: LLMs have demonstrated impressive proficiency in generating coherent and high-quality text, making them valuable across a range of text-generation tasks. However, rigorous evaluation of this generated content is crucial, as ensuring its quality remains a significant challenge due to persistent issues such as factual inaccuracies and hallucination. This paper introduces three fine-tuned general-pur… ▽ More

    Submitted 18 February, 2025; v1 submitted 2 November, 2024; originally announced November 2024.

  29. arXiv:2411.00625  [pdf, other

    cs.NE cs.LG

    Toward Automated Algorithm Design: A Survey and Practical Guide to Meta-Black-Box-Optimization

    Authors: Zeyuan Ma, Hongshu Guo, Yue-Jiao Gong, Jun Zhang, Kay Chen Tan

    Abstract: In this survey, we introduce Meta-Black-Box-Optimization~(MetaBBO) as an emerging avenue within the Evolutionary Computation~(EC) community, which incorporates Meta-learning approaches to assist automated algorithm design. Despite the success of MetaBBO, the current literature provides insufficient summaries of its key aspects and lacks practical guidance for implementation. To bridge this gap, we… ▽ More

    Submitted 30 April, 2025; v1 submitted 1 November, 2024; originally announced November 2024.

  30. arXiv:2410.09773  [pdf, other

    cs.CL

    A Mixed-Language Multi-Document News Summarization Dataset and a Graphs-Based Extract-Generate Model

    Authors: Shengxiang Gao, Fang nan, Yongbing Zhang, Yuxin Huang, Kaiwen Tan, Zhengtao Yu

    Abstract: Existing research on news summarization primarily focuses on single-language single-document (SLSD), single-language multi-document (SLMD) or cross-language single-document (CLSD). However, in real-world scenarios, news about a international event often involves multiple documents in different languages, i.e., mixed-language multi-document (MLMD). Therefore, summarizing MLMD news is of great signi… ▽ More

    Submitted 13 October, 2024; originally announced October 2024.

  31. arXiv:2410.04785  [pdf, other

    eess.AS cs.SD

    Towards Ultra-Low-Power Neuromorphic Speech Enhancement with Spiking-FullSubNet

    Authors: Xiang Hao, Chenxiang Ma, Qu Yang, Jibin Wu, Kay Chen Tan

    Abstract: Speech enhancement is critical for improving speech intelligibility and quality in various audio devices. In recent years, deep learning-based methods have significantly improved speech enhancement performance, but they often come with a high computational cost, which is prohibitive for a large number of edge devices, such as headsets and hearing aids. This work proposes an ultra-low-power speech… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

    Comments: under review

  32. arXiv:2410.02629  [pdf, other

    math.ST cs.LG stat.ME

    Estimating Generalization Performance Along the Trajectory of Proximal SGD in Robust Regression

    Authors: Kai Tan, Pierre C. Bellec

    Abstract: This paper studies the generalization performance of iterates obtained by Gradient Descent (GD), Stochastic Gradient Descent (SGD) and their proximal variants in high-dimensional robust regression problems. The number of features is comparable to the sample size and errors may be heavy-tailed. We introduce estimators that precisely track the generalization error of the iterates along the trajector… ▽ More

    Submitted 3 November, 2024; v1 submitted 3 October, 2024; originally announced October 2024.

    Comments: Camera-ready version of NeurIPS 2024 paper

  33. arXiv:2410.02210  [pdf, other

    cs.CL cs.LG

    Calibrate to Discriminate: Improve In-Context Learning with Label-Free Comparative Inference

    Authors: Wei Cheng, Tianlu Wang, Yanmin Ji, Fan Yang, Keren Tan, Yiyu Zheng

    Abstract: While in-context learning with large language models (LLMs) has shown impressive performance, we have discovered a unique miscalibration behavior where both correct and incorrect predictions are assigned the same level of confidence. We refer to this phenomenon as indiscriminate miscalibration. We found that traditional calibration metrics, such as Expected Calibrated Errors (ECEs), are unable to… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

    Comments: 19 pages

  34. arXiv:2409.18893  [pdf, other

    cs.LG

    HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models

    Authors: Yu Zhou, Xingyu Wu, Jibin Wu, Liang Feng, Kay Chen Tan

    Abstract: Model merging is a technique that combines multiple large pretrained models into a single model with enhanced performance and broader task adaptability. It has gained popularity in large pretrained model development due to its ability to bypass the need for original training data and further training processes. However, most existing model merging approaches focus solely on exploring the parameter… ▽ More

    Submitted 27 September, 2024; originally announced September 2024.

  35. arXiv:2409.18636  [pdf, other

    cs.CV

    Unsupervised Fingerphoto Presentation Attack Detection With Diffusion Models

    Authors: Hailin Li, Raghavendra Ramachandra, Mohamed Ragab, Soumik Mondal, Yong Kiam Tan, Khin Mi Mi Aung

    Abstract: Smartphone-based contactless fingerphoto authentication has become a reliable alternative to traditional contact-based fingerprint biometric systems owing to rapid advances in smartphone camera technology. Despite its convenience, fingerprint authentication through fingerphotos is more vulnerable to presentation attacks, which has motivated recent research efforts towards developing fingerphoto Pr… ▽ More

    Submitted 27 September, 2024; originally announced September 2024.

    Comments: Accepted by IJCB 2024

  36. arXiv:2409.04270  [pdf, other

    cs.NE

    Advancing Automated Knowledge Transfer in Evolutionary Multitasking via Large Language Models

    Authors: Yuxiao Huang, Xuebin Lv, Shenghao Wu, Jibin Wu, Liang Feng, Kay Chen Tan

    Abstract: Evolutionary Multi-task Optimization (EMTO) is a paradigm that leverages knowledge transfer across simultaneously optimized tasks for enhanced search performance. To facilitate EMTO's performance, various knowledge transfer models have been developed for specific optimization tasks. However, designing these models often requires substantial expert knowledge. Recently, large language models (LLMs)… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

    Comments: 10 pages, 11 pages

  37. arXiv:2409.00735  [pdf, other

    cs.AI cs.LG

    AgGym: An agricultural biotic stress simulation environment for ultra-precision management planning

    Authors: Mahsa Khosravi, Matthew Carroll, Kai Liang Tan, Liza Van der Laan, Joscif Raigne, Daren S. Mueller, Arti Singh, Aditya Balu, Baskar Ganapathysubramanian, Asheesh Kumar Singh, Soumik Sarkar

    Abstract: Agricultural production requires careful management of inputs such as fungicides, insecticides, and herbicides to ensure a successful crop that is high-yielding, profitable, and of superior seed quality. Current state-of-the-art field crop management relies on coarse-scale crop management strategies, where entire fields are sprayed with pest and disease-controlling chemicals, leading to increased… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  38. arXiv:2408.14917  [pdf, other

    cs.NE

    PMSN: A Parallel Multi-compartment Spiking Neuron for Multi-scale Temporal Processing

    Authors: Xinyi Chen, Jibin Wu, Chenxiang Ma, Yinsong Yan, Yujie Wu, Kay Chen Tan

    Abstract: Spiking Neural Networks (SNNs) hold great potential to realize brain-inspired, energy-efficient computational systems. However, current SNNs still fall short in terms of multi-scale temporal processing compared to their biological counterparts. This limitation has resulted in poor performance in many pattern recognition tasks with information that varies across different timescales. To address thi… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  39. arXiv:2408.11330  [pdf, other

    cs.LG cs.CL

    Design Principle Transfer in Neural Architecture Search via Large Language Models

    Authors: Xun Zhou, Xingyu Wu, Liang Feng, Zhichao Lu, Kay Chen Tan

    Abstract: Transferable neural architecture search (TNAS) has been introduced to design efficient neural architectures for multiple tasks, to enhance the practical applicability of NAS in real-world scenarios. In TNAS, architectural knowledge accumulated in previous search processes is reused to warm up the architecture search for new tasks. However, existing TNAS methods still search in an extensive search… ▽ More

    Submitted 17 December, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

  40. arXiv:2408.08044  [pdf, other

    cs.CE

    Crystalline Material Discovery in the Era of Artificial Intelligence

    Authors: Zhenzhong Wang, Haowei Hua, Wanyu Lin, Ming Yang, Kay Chen Tan

    Abstract: Crystalline materials, with symmetrical and periodic structures, exhibit a wide spectrum of properties and have been widely used in numerous applications across electronics, energy, and beyond. For crystalline materials discovery, traditional experimental and computational approaches are time-consuming and expensive. In these years, thanks to the explosive amount of crystalline materials data, gre… ▽ More

    Submitted 1 February, 2025; v1 submitted 15 August, 2024; originally announced August 2024.

  41. arXiv:2408.07176  [pdf, other

    cs.NE

    Surrogate-Assisted Search with Competitive Knowledge Transfer for Expensive Optimization

    Authors: Xiaoming Xue, Yao Hu, Liang Feng, Kai Zhang, Linqi Song, Kay Chen Tan

    Abstract: Expensive optimization problems (EOPs) have attracted increasing research attention over the decades due to their ubiquity in a variety of practical applications. Despite many sophisticated surrogate-assisted evolutionary algorithms (SAEAs) that have been developed for solving such problems, most of them lack the ability to transfer knowledge from previously-solved tasks and always start their sea… ▽ More

    Submitted 20 August, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

    Comments: 22 pages, 14 figures

  42. arXiv:2408.06468  [pdf, other

    cs.SD cs.MM eess.AS eess.SP

    FoVNet: Configurable Field-of-View Speech Enhancement with Low Computation and Distortion for Smart Glasses

    Authors: Zhongweiyang Xu, Ali Aroudi, Ke Tan, Ashutosh Pandey, Jung-Suk Lee, Buye Xu, Francesco Nesta

    Abstract: This paper presents a novel multi-channel speech enhancement approach, FoVNet, that enables highly efficient speech enhancement within a configurable field of view (FoV) of a smart-glasses user without needing specific target-talker(s) directions. It advances over prior works by enhancing all speakers within any given FoV, with a hybrid signal processing and deep learning approach designed with hi… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: Accepted by INTERSPEECH2024

  43. arXiv:2408.04526  [pdf, other

    stat.ML cs.LG

    Hybrid Reinforcement Learning Breaks Sample Size Barriers in Linear MDPs

    Authors: Kevin Tan, Wei Fan, Yuting Wei

    Abstract: Hybrid Reinforcement Learning (RL), where an agent learns from both an offline dataset and online explorations in an unknown environment, has garnered significant recent interest. A crucial question posed by Xie et al. (2022) is whether hybrid RL can improve upon the existing lower bounds established in purely offline and purely online RL without relying on the single-policy concentrability assump… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  44. arXiv:2408.01093  [pdf, other

    cs.MA cs.RO

    CommonUppRoad: A Framework of Formal Modelling, Verifying, Learning, and Visualisation of Autonomous Vehicles

    Authors: Rong Gu, Kaige Tan, Andreas Holck Høeg-Petersen, Lei Feng, Kim Guldstrand Larsen

    Abstract: Combining machine learning and formal methods (FMs) provides a possible solution to overcome the safety issue of autonomous driving (AD) vehicles. However, there are gaps to be bridged before this combination becomes practically applicable and useful. In an attempt to facilitate researchers in both FMs and AD areas, this paper proposes a framework that combines two well-known tools, namely CommonR… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 20 pages, 5 figures, ISoLA 2024

  45. arXiv:2407.19430  [pdf, other

    cs.CV

    Progressive Domain Adaptation for Thermal Infrared Object Tracking

    Authors: Qiao Li, Kanlun Tan, Qiao Liu, Di Yuan, Xin Li, Yunpeng Liu

    Abstract: Due to the lack of large-scale labeled Thermal InfraRed (TIR) training datasets, most existing TIR trackers are trained directly on RGB datasets. However, tracking methods trained on RGB datasets suffer a significant drop-off in TIR data due to the domain shift issue. To this end, in this work, we propose a Progressive Domain Adaptation framework for TIR Tracking (PDAT), which transfers useful kno… ▽ More

    Submitted 3 September, 2024; v1 submitted 28 July, 2024; originally announced July 2024.

    Comments: 10 pages, 8 figures

  46. arXiv:2407.02639  [pdf, other

    cs.CV

    Holistically-Nested Structure-Aware Graph Neural Network for Road Extraction

    Authors: Tinghuai Wang, Guangming Wang, Kuan Eeik Tan

    Abstract: Convolutional neural networks (CNN) have made significant advances in detecting roads from satellite images. However, existing CNN approaches are generally repurposed semantic segmentation architectures and suffer from the poor delineation of long and curved regions. Lack of overall road topology and structure information further deteriorates their performance on challenging remote sensing images.… ▽ More

    Submitted 8 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  47. arXiv:2406.14359  [pdf, other

    cs.NE

    Learning to Transfer for Evolutionary Multitasking

    Authors: Sheng-Hao Wu, Yuxiao Huang, Xingyu Wu, Liang Feng, Zhi-Hui Zhan, Kay Chen Tan

    Abstract: Evolutionary multitasking (EMT) is an emerging approach for solving multitask optimization problems (MTOPs) and has garnered considerable research interest. The implicit EMT is a significant research branch that utilizes evolution operators to enable knowledge transfer (KT) between tasks. However, current approaches in implicit EMT face challenges in adaptability, due to the use of a limited numbe… ▽ More

    Submitted 22 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: Under review

  48. arXiv:2406.12313  [pdf

    cs.DB

    A framework for developing a knowledge management platform

    Authors: Marie Lisandra Zepeda Mendoza, Sonali Agarwal, James A. Blackshaw, Vanesa Bol, Audrey Fazzi, Filippo Fiorini, Amy Louise Foreman, Nancy George, Brett R. Johnson, Brian Martin, Dave McComb, Euphemia Mutasa-Gottgens, Helen Parkinson, Martin Romacker, Rolf Russell, Valérien Ségard, Shawn Zheng Kai Tan, Wei Kheng Teh, F. P. Winstanley, Benedict Wong, Adrian M. Smith

    Abstract: Knowledge management (KM) involves collecting, organizing, storing, and disseminating information to improve decision-making, innovation, and performance. Implementing KM at scale has become essential for organizations to effectively leverage vast accessible data. This paper is a compilation of concepts that emerged from KM workshops hosted by EMBL-EBI, attended by SMEs and industry. We provide gu… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 18 pages, 1 figure

  49. arXiv:2406.11809  [pdf, other

    cs.LG cs.RO eess.SY

    Physics-Constrained Learning for PDE Systems with Uncertainty Quantified Port-Hamiltonian Models

    Authors: Kaiyuan Tan, Peilun Li, Thomas Beckers

    Abstract: Modeling the dynamics of flexible objects has become an emerging topic in the community as these objects become more present in many applications, e.g., soft robotics. Due to the properties of flexible materials, the movements of soft objects are often highly nonlinear and, thus, complex to predict. Data-driven approaches seem promising for modeling those complex dynamics but often neglect basic p… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  50. arXiv:2406.11619  [pdf, other

    eess.AS cs.LG

    AV-CrossNet: an Audiovisual Complex Spectral Mapping Network for Speech Separation By Leveraging Narrow- and Cross-Band Modeling

    Authors: Vahid Ahmadi Kalkhorani, Cheng Yu, Anurag Kumar, Ke Tan, Buye Xu, DeLiang Wang

    Abstract: Adding visual cues to audio-based speech separation can improve separation performance. This paper introduces AV-CrossNet, an audiovisual (AV) system for speech enhancement, target speaker extraction, and multi-talker speaker separation. AV-CrossNet is extended from the CrossNet architecture, which is a recently proposed network that performs complex spectral mapping for speech separation by lever… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 10 pages, 4 Figures, and 4 Tables