Skip to main content

Showing 1–50 of 308 results for author: Tan, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.23767  [pdf, ps, other

    q-fin.RM cs.LG

    Explainable AI for Comprehensive Risk Assessment for Financial Reports: A Lightweight Hierarchical Transformer Network Approach

    Authors: Xue Wen Tan, Stanley Kok

    Abstract: Every publicly traded U.S. company files an annual 10-K report containing critical insights into financial health and risk. We propose Tiny eXplainable Risk Assessor (TinyXRA), a lightweight and explainable transformer-based model that automatically assesses company risk from these reports. Unlike prior work that relies solely on the standard deviation of excess returns (adjusted for the Fama-Fren… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  2. arXiv:2506.09935  [pdf, ps, other

    cs.CV

    LEO-VL: Towards 3D Vision-Language Generalists via Data Scaling with Efficient Representation

    Authors: Jiangyong Huang, Xiaojian Ma, Xiongkun Linghu, Yue Fan, Junchao He, Wenxin Tan, Qing Li, Song-Chun Zhu, Yixin Chen, Baoxiong Jia, Siyuan Huang

    Abstract: Developing 3D-VL generalists capable of understanding 3D scenes and following natural language instructions to perform a wide range of tasks has been a long-standing goal in the 3D-VL community. Despite recent progress, 3D-VL models still lag behind their 2D counterparts in capability and robustness, falling short of the generalist standard. A key obstacle to developing 3D-VL generalists lies in d… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: Project page: https://leo-vl.github.io

  3. arXiv:2506.07634  [pdf, ps, other

    eess.AS cs.MM

    SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement

    Authors: Chenyu Yang, Shuai Wang, Hangting Chen, Wei Tan, Jianwei Yu, Haizhou Li

    Abstract: Generating music with coherent structure, harmonious instrumental and vocal elements remains a significant challenge in song generation. Existing language models and diffusion-based methods often struggle to balance global coherence with local fidelity, resulting in outputs that lack musicality or suffer from incoherent progression and mismatched lyrics. This paper introduces $\textbf{SongBloom}$,… ▽ More

    Submitted 23 June, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

    Comments: Submitted to NeurIPS2025

  4. arXiv:2506.07520  [pdf, ps, other

    cs.SD cs.AI eess.AS

    LeVo: High-Quality Song Generation with Multi-Preference Alignment

    Authors: Shun Lei, Yaoxun Xu, Zhiwei Lin, Huaicheng Zhang, Wei Tan, Hangting Chen, Jianwei Yu, Yixuan Zhang, Chenyu Yang, Haina Zhu, Shuai Wang, Zhiyong Wu, Dong Yu

    Abstract: Recent advances in large language models (LLMs) and audio language models have significantly improved music generation, particularly in lyrics-to-song generation. However, existing approaches still struggle with the complex composition of songs and the scarcity of high-quality data, leading to limitations in sound quality, musicality, instruction following, and vocal-instrument harmony. To address… ▽ More

    Submitted 15 June, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

  5. arXiv:2505.16552  [pdf, ps, other

    cs.CL

    Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains

    Authors: Wenhui Tan, Jiaze Li, Jianzhong Ju, Zhenbo Luo, Jian Luan, Ruihua Song

    Abstract: Large Language Models (LLMs) achieve superior performance through Chain-of-Thought (CoT) reasoning, but these token-level reasoning chains are computationally expensive and inefficient. In this paper, we introduce Compressed Latent Reasoning (CoLaR), a novel framework that dynamically compresses reasoning processes in latent space through a two-stage training approach. First, during supervised fin… ▽ More

    Submitted 3 June, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: 15 pages, 8 figures

  6. arXiv:2505.10769  [pdf

    cs.CV

    Unifying Segment Anything in Microscopy with Multimodal Large Language Model

    Authors: Manyu Li, Ruian He, Zixian Zhang, Weimin Tan, Bo Yan

    Abstract: Accurate segmentation of regions of interest in biomedical images holds substantial value in image analysis. Although several foundation models for biomedical segmentation have currently achieved excellent performance on certain datasets, they typically demonstrate sub-optimal performance on unseen domain data. We owe the deficiency to lack of vision-language knowledge before segmentation. Multimo… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: 18 pages, 9 figures

    MSC Class: 68T99

  7. TrumorGPT: Graph-Based Retrieval-Augmented Large Language Model for Fact-Checking

    Authors: Ching Nam Hang, Pei-Duo Yu, Chee Wei Tan

    Abstract: In the age of social media, the rapid spread of misinformation and rumors has led to the emergence of infodemics, where false information poses a significant threat to society. To combat this issue, we introduce TrumorGPT, a novel generative artificial intelligence solution designed for fact-checking in the health domain. TrumorGPT aims to distinguish "trumors", which are health-related rumors tha… ▽ More

    Submitted 22 June, 2025; v1 submitted 11 May, 2025; originally announced May 2025.

  8. arXiv:2505.06152  [pdf, ps, other

    cs.CV cs.AI

    MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from Textbooks

    Authors: Wenqi Zeng, Yuqi Sun, Chenxi Ma, Weimin Tan, Bo Yan

    Abstract: Medical vision-language models (VLMs) have shown promise as clinical assistants across various medical fields. However, specialized dermatology VLM capable of delivering professional and detailed diagnostic analysis remains underdeveloped, primarily due to less specialized text descriptions in current dermatology multimodal datasets. To address this issue, we propose MM-Skin, the first large-scale… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  9. arXiv:2505.04361  [pdf

    cs.CE

    RDPP-TD: Reputation and Data Privacy-Preserving based Truth Discovery Scheme in Mobile Crowdsensing

    Authors: Lijian Wu, Weikun Xie, Wei Tan, Tian Wang, Houbing Herbert Song, Anfeng Liu

    Abstract: Truth discovery (TD) plays an important role in Mobile Crowdsensing (MCS). However, existing TD methods, including privacy-preserving TD approaches, estimate the truth by weighting only the data submitted in the current round, which often results in low data quality. Moreover, there is a lack of effective TD methods that preserve both reputation and data privacy. To address these issues, a Reputat… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  10. arXiv:2505.03792  [pdf, ps, other

    cs.LG cs.AI

    Towards Efficient Online Tuning of VLM Agents via Counterfactual Soft Reinforcement Learning

    Authors: Lang Feng, Weihao Tan, Zhiyi Lyu, Longtao Zheng, Haiyang Xu, Ming Yan, Fei Huang, Bo An

    Abstract: Online fine-tuning vision-language model (VLM) agents with reinforcement learning (RL) has shown promise for equipping agents with multi-step, goal-oriented capabilities in dynamic environments. However, their open-ended textual action space and non-end-to-end nature of action generation present significant challenges to effective online exploration in RL, e.g., explosion of the exploration space.… ▽ More

    Submitted 3 June, 2025; v1 submitted 1 May, 2025; originally announced May 2025.

    Comments: ICML 2025

  11. arXiv:2504.18087  [pdf, other

    cs.CV

    Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation

    Authors: Weipeng Tan, Chuming Lin, Chengming Xu, FeiFan Xu, Xiaobin Hu, Xiaozhong Ji, Junwei Zhu, Chengjie Wang, Yanwei Fu

    Abstract: Recent advances in Talking Head Generation (THG) have achieved impressive lip synchronization and visual quality through diffusion models; yet existing methods struggle to generate emotionally expressive portraits while preserving speaker identity. We identify three critical limitations in current emotional talking head generation: insufficient utilization of audio's inherent emotional cues, ident… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: arXiv admin note: text overlap with arXiv:2409.03270

  12. arXiv:2504.16068  [pdf, other

    physics.comp-ph cs.LG physics.chem-ph

    High-performance training and inference for deep equivariant interatomic potentials

    Authors: Chuin Wei Tan, Marc L. Descoteaux, Mit Kotak, Gabriel de Miranda Nascimento, Seán R. Kavanagh, Laura Zichi, Menghang Wang, Aadit Saluja, Yizhong R. Hu, Tess Smidt, Anders Johansson, William C. Witt, Boris Kozinsky, Albert Musaelian

    Abstract: Machine learning interatomic potentials, particularly those based on deep equivariant neural networks, have demonstrated state-of-the-art accuracy and computational efficiency in atomistic modeling tasks like molecular dynamics and high-throughput screening. The size of datasets and demands of downstream workflows are growing rapidly, making robust and scalable software essential. This work presen… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  13. arXiv:2504.13234  [pdf, other

    cs.LG cs.AI

    Non-Uniform Class-Wise Coreset Selection: Characterizing Category Difficulty for Data-Efficient Transfer Learning

    Authors: Hanyu Zhang, Zhen Xing, Wenxuan Yang, Chenxi Ma, Weimin Tan, Bo Yan

    Abstract: As transfer learning models and datasets grow larger, efficient adaptation and storage optimization have become critical needs. Coreset selection addresses these challenges by identifying and retaining the most informative samples, constructing a compact subset for target domain training. However, current methods primarily rely on instance-level difficulty assessments, overlooking crucial category… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: 11pages

  14. arXiv:2504.13219  [pdf, other

    cs.LG cs.AI

    Scaling Laws for Data-Efficient Visual Transfer Learning

    Authors: Wenxuan Yang, Qingqu Wei, Chenxi Ma, Weimin Tan, Bo Yan

    Abstract: Current scaling laws for visual AI models focus predominantly on large-scale pretraining, leaving a critical gap in understanding how performance scales for data-constrained downstream tasks. To address this limitation, this paper establishes the first practical framework for data-efficient scaling laws in visual transfer learning, addressing two fundamental questions: 1) How do scaling behaviors… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  15. arXiv:2504.12816  [pdf, other

    cs.CL

    SMARTe: Slot-based Method for Accountable Relational Triple extraction

    Authors: Xue Wen Tan, Stanley Kok

    Abstract: Relational Triple Extraction (RTE) is a fundamental task in Natural Language Processing (NLP). However, prior research has primarily focused on optimizing model performance, with limited efforts to understand the internal mechanisms driving these models. Many existing methods rely on complex preprocessing to induce specific interactions, often resulting in opaque systems that may not fully align w… ▽ More

    Submitted 22 May, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

  16. arXiv:2504.11259  [pdf, ps, other

    cs.DB

    The Cambridge Report on Database Research

    Authors: Anastasia Ailamaki, Samuel Madden, Daniel Abadi, Gustavo Alonso, Sihem Amer-Yahia, Magdalena Balazinska, Philip A. Bernstein, Peter Boncz, Michael Cafarella, Surajit Chaudhuri, Susan Davidson, David DeWitt, Yanlei Diao, Xin Luna Dong, Michael Franklin, Juliana Freire, Johannes Gehrke, Alon Halevy, Joseph M. Hellerstein, Mark D. Hill, Stratos Idreos, Yannis Ioannidis, Christoph Koch, Donald Kossmann, Tim Kraska , et al. (21 additional authors not shown)

    Abstract: On October 19 and 20, 2023, the authors of this report convened in Cambridge, MA, to discuss the state of the database research field, its recent accomplishments and ongoing challenges, and future directions for research and community engagement. This gathering continues a long standing tradition in the database community, dating back to the late 1980s, in which researchers meet roughly every five… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  17. arXiv:2504.01866  [pdf, other

    cs.SE cs.AI cs.PL

    From Code Generation to Software Testing: AI Copilot with Context-Based RAG

    Authors: Yuchen Wang, Shangxin Guo, Chee Wei Tan

    Abstract: The rapid pace of large-scale software development places increasing demands on traditional testing methodologies, often leading to bottlenecks in efficiency, accuracy, and coverage. We propose a novel perspective on software testing by positing bug detection and coding with fewer bugs as two interconnected problems that share a common goal, which is reducing bugs with limited resources. We extend… ▽ More

    Submitted 5 April, 2025; v1 submitted 2 April, 2025; originally announced April 2025.

    Comments: This work has been accepted for publication in IEEE Software (DOI: 10.1109/MS.2025.3549628)

  18. arXiv:2503.19700  [pdf

    cs.CV

    Optimization of MedSAM model based on bounding box adaptive perturbation algorithm

    Authors: Boyi Li, Ye Yuan, Wenjun Tan

    Abstract: The MedSAM model, built upon the SAM framework, enhances medical image segmentation through generalizable training but still exhibits notable limitations. First, constraints in the perturbation window settings during training can cause MedSAM to incorrectly segment small tissues or organs together with adjacent structures, leading to segmentation errors. Second, when dealing with medical image tar… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: 6 pages, 6 figures, 3 Tables

  19. arXiv:2503.16451  [pdf, other

    cs.HC cs.AI cs.RO

    Think-Then-React: Towards Unconstrained Human Action-to-Reaction Generation

    Authors: Wenhui Tan, Boyuan Li, Chuhao Jin, Wenbing Huang, Xiting Wang, Ruihua Song

    Abstract: Modeling human-like action-to-reaction generation has significant real-world applications, like human-robot interaction and games. Despite recent advancements in single-person motion generation, it is still challenging to well handle action-to-reaction generation, due to the difficulty of directly predicting reaction from action sequence without prompts, and the absence of a unified representation… ▽ More

    Submitted 19 February, 2025; originally announced March 2025.

    Comments: Accepted by ICLR 2025

  20. Aligning Crowd-sourced Human Feedback for Reinforcement Learning on Code Generation by Large Language Models

    Authors: Man Fai Wong, Chee Wei Tan

    Abstract: This paper studies how AI-assisted programming and large language models (LLM) improve software developers' ability via AI tools (LLM agents) like Github Copilot and Amazon CodeWhisperer, while integrating human feedback to enhance reinforcement learning (RLHF) with crowd-sourced computation to enhance text-to-code generation. Additionally, we demonstrate that our Bayesian optimization framework s… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

  21. arXiv:2503.14895  [pdf, other

    cs.CV cs.AI cs.CL

    Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations

    Authors: Shuo Li, Jiajun Sun, Guodong Zheng, Xiaoran Fan, Yujiong Shen, Yi Lu, Zhiheng Xi, Yuming Yang, Wenming Tan, Tao Ji, Tao Gui, Qi Zhang, Xuanjing Huang

    Abstract: Recently, multimodal large language models (MLLMs) have demonstrated remarkable performance in visual-language tasks. However, the authenticity of the responses generated by MLLMs is often compromised by object hallucinations. We identify that a key cause of these hallucinations is the model's over-susceptibility to specific image frequency features in detecting objects. In this paper, we introduc… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

  22. arXiv:2503.09962  [pdf, other

    cs.CV

    Modeling Thousands of Human Annotators for Generalizable Text-to-Image Person Re-identification

    Authors: Jiayu Jiang, Changxing Ding, Wentao Tan, Junhong Wang, Jin Tao, Xiangmin Xu

    Abstract: Text-to-image person re-identification (ReID) aims to retrieve the images of an interested person based on textual descriptions. One main challenge for this task is the high cost in manually annotating large-scale databases, which affects the generalization ability of ReID models. Recent works handle this problem by leveraging Multi-modal Large Language Models (MLLMs) to describe pedestrian images… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

    Comments: CVPR 2025. Project website: https://github.com/sssaury/HAM

  23. arXiv:2503.05130  [pdf, other

    cs.DC

    Dilu: Enabling GPU Resourcing-on-Demand for Serverless DL Serving via Introspective Elasticity

    Authors: Cunchi Lv, Xiao Shi, Zhengyu Lei, Jinyue Huang, Wenting Tan, Xiaohui Zheng, Xiaofang Zhao

    Abstract: Serverless computing, with its ease of management, auto-scaling, and cost-effectiveness, is widely adopted by deep learning (DL) applications. DL workloads, especially with large language models, require substantial GPU resources to ensure QoS. However, it is prone to produce GPU fragments (e.g., 15\%-94\%) in serverless DL systems due to the dynamicity of workloads and coarse-grained static GPU a… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  24. arXiv:2503.02550  [pdf, other

    cs.DC

    SpecInF: Exploiting Idle GPU Resources in Distributed DL Training via Speculative Inference Filling

    Authors: Cunchi Lv, Xiao Shi, Dong Liang, Wenting Tan, Xiaofang Zhao

    Abstract: Deep Learning (DL), especially with Large Language Models (LLMs), brings benefits to various areas. However, DL training systems usually yield prominent idling GPU resources due to many factors, such as resource allocation and collective communication. To improve GPU utilization, we present SpecInF, which adopts a Speculative Inference Filling method to exploit idle GPU resources. It collocates ea… ▽ More

    Submitted 26 March, 2025; v1 submitted 4 March, 2025; originally announced March 2025.

  25. arXiv:2502.20311  [pdf, other

    cs.LG cs.SD eess.AS

    Adapting Automatic Speech Recognition for Accented Air Traffic Control Communications

    Authors: Marcus Yu Zhe Wee, Justin Juin Hng Wong, Lynus Lim, Joe Yu Wei Tan, Prannaya Gupta, Dillion Lim, En Hao Tew, Aloysius Keng Siew Han, Yong Zhi Lim

    Abstract: Effective communication in Air Traffic Control (ATC) is critical to maintaining aviation safety, yet the challenges posed by accented English remain largely unaddressed in Automatic Speech Recognition (ASR) systems. Existing models struggle with transcription accuracy for Southeast Asian-accented (SEA-accented) speech, particularly in noisy ATC environments. This study presents the development of… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

  26. arXiv:2502.15122  [pdf, other

    cs.LG

    MONSTER: Monash Scalable Time Series Evaluation Repository

    Authors: Angus Dempster, Navid Mohammadi Foumani, Chang Wei Tan, Lynn Miller, Amish Mishra, Mahsa Salehi, Charlotte Pelletier, Daniel F. Schmidt, Geoffrey I. Webb

    Abstract: We introduce MONSTER-the MONash Scalable Time Series Evaluation Repository-a collection of large datasets for time series classification. The field of time series classification has benefitted from common benchmarks set by the UCR and UEA time series classification repositories. However, the datasets in these benchmarks are small, with median sizes of 217 and 255 examples, respectively. In consequ… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

    Comments: 45 pages; 38 figures

  27. arXiv:2502.04399  [pdf, other

    cs.LG cs.AI eess.SY

    Online Location Planning for AI-Defined Vehicles: Optimizing Joint Tasks of Order Serving and Spatio-Temporal Heterogeneous Model Fine-Tuning

    Authors: Bokeng Zheng, Bo Rao, Tianxiang Zhu, Chee Wei Tan, Jingpu Duan, Zhi Zhou, Xu Chen, Xiaoxi Zhang

    Abstract: Advances in artificial intelligence (AI) including foundation models (FMs), are increasingly transforming human society, with smart city driving the evolution of urban living.Meanwhile, vehicle crowdsensing (VCS) has emerged as a key enabler, leveraging vehicles' mobility and sensor-equipped capabilities. In particular, ride-hailing vehicles can effectively facilitate flexible data collection and… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

  28. arXiv:2501.12148  [pdf, other

    cs.IT

    Deep Unfolding of Fixed-Point Based Algorithm for Weighted Sum Rate Maximization

    Authors: Jan Christian Hauffen, Chee Wei Tan, Giuseppe Caire

    Abstract: In this paper, we propose a novel approach that harnesses the standard interference function, specifically tailored to address the unique challenges of non-convex optimization in wireless networks. We begin by establishing theoretical guarantees for our method under the assumption that the interference function exhibits log-concavity. Building on this foundation, we develop a Primal-Dual Algorithm… ▽ More

    Submitted 22 January, 2025; v1 submitted 21 January, 2025; originally announced January 2025.

  29. arXiv:2501.08809  [pdf, other

    cs.SD cs.AI eess.AS

    XMusic: Towards a Generalized and Controllable Symbolic Music Generation Framework

    Authors: Sida Tian, Can Zhang, Wei Yuan, Wei Tan, Wenjie Zhu

    Abstract: In recent years, remarkable advancements in artificial intelligence-generated content (AIGC) have been achieved in the fields of image synthesis and text generation, generating content comparable to that produced by humans. However, the quality of AI-generated music has not yet reached this standard, primarily due to the challenge of effectively controlling musical emotions and ensuring high-quali… ▽ More

    Submitted 15 January, 2025; originally announced January 2025.

    Comments: accepted by TMM

  30. arXiv:2501.01108  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization

    Authors: Haina Zhu, Yizhi Zhou, Hangting Chen, Jianwei Yu, Ziyang Ma, Rongzhi Gu, Yi Luo, Wei Tan, Xie Chen

    Abstract: Recent years have witnessed the success of foundation models pre-trained with self-supervised learning (SSL) in various music informatics understanding tasks, including music tagging, instrument classification, key detection, and more. In this paper, we propose a self-supervised music representation learning model for music understanding. Distinguished from previous studies adopting random project… ▽ More

    Submitted 3 January, 2025; v1 submitted 2 January, 2025; originally announced January 2025.

  31. arXiv:2412.15650  [pdf, other

    cs.LG

    Beyond Human Data: Aligning Multimodal Large Language Models by Iterative Self-Evolution

    Authors: Wentao Tan, Qiong Cao, Yibing Zhan, Chao Xue, Changxing Ding

    Abstract: Human preference alignment can greatly enhance Multimodal Large Language Models (MLLMs), but collecting high-quality preference data is costly. A promising solution is the self-evolution strategy, where models are iteratively trained on data they generate. However, current techniques still rely on human- or GPT-annotated data and sometimes require additional models or ground truth answers. To addr… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

    Comments: AAAI 2025. The code is available at https://github.com/WentaoTan/SENA

  32. arXiv:2412.13786  [pdf, other

    eess.AS cs.SD

    SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor

    Authors: Chenyu Yang, Shuai Wang, Hangting Chen, Jianwei Yu, Wei Tan, Rongzhi Gu, Yaoxun Xu, Yizhi Zhou, Haina Zhu, Haizhou Li

    Abstract: The emergence of novel generative modeling paradigms, particularly audio language models, has significantly advanced the field of song generation. Although state-of-the-art models are capable of synthesizing both vocals and accompaniment tracks up to several minutes long concurrently, research about partial adjustments or editing of existing songs is still underexplored, which allows for more flex… ▽ More

    Submitted 28 January, 2025; v1 submitted 18 December, 2024; originally announced December 2024.

    Comments: Accepted by AAAI2025

  33. arXiv:2412.13528  [pdf

    cs.HC

    ScamGPT-J: Inside the Scammer's Mind, A Generative AI-Based Approach Toward Combating Messaging Scams

    Authors: Xue Wen Tan, Kenneth See, Stanley Kok

    Abstract: The increase in global cellphone usage has led to a spike in instant messaging scams, causing extensive socio-economic damage with yearly losses exceeding half a trillion US dollars. These scams pose a challenge to the integrity of justice systems worldwide due to their international nature, which complicates legal action. Scams often exploit emotional vulnerabilities, making detection difficult f… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

    Comments: International Conference on Information Systems 2024

  34. arXiv:2411.14508  [pdf, other

    cond-mat.mtrl-sci cs.CE

    Multi-objective Bayesian Optimisation of Spinodoid Cellular Structures for Crush Energy Absorption

    Authors: Hirak Kansara, Siamak F. Khosroshahi, Leo Guo, Miguel A. Bessa, Wei Tan

    Abstract: In the pursuit of designing safer and more efficient energy-absorbing structures, engineers must tackle the challenge of improving crush performance while balancing multiple conflicting objectives, such as maximising energy absorption and minimising peak impact forces. Accurately simulating real-world conditions necessitates the use of complex material models to replicate the non-linear behaviour… ▽ More

    Submitted 24 February, 2025; v1 submitted 21 November, 2024; originally announced November 2024.

  35. arXiv:2411.11044  [pdf, ps, other

    cs.CR cs.LG

    Efficient Federated Unlearning with Adaptive Differential Privacy Preservation

    Authors: Yu Jiang, Xindi Tong, Ziyao Liu, Huanyi Ye, Chee Wei Tan, Kwok-Yan Lam

    Abstract: Federated unlearning (FU) offers a promising solution to effectively address the need to erase the impact of specific clients' data on the global model in federated learning (FL), thereby granting individuals the ``Right to be Forgotten". The most straightforward approach to achieve unlearning is to train the model from scratch, excluding clients who request data removal, but it is resource-intens… ▽ More

    Submitted 17 November, 2024; originally announced November 2024.

  36. arXiv:2411.11039  [pdf, other

    cs.LG cs.DC

    FedUHB: Accelerating Federated Unlearning via Polyak Heavy Ball Method

    Authors: Yu Jiang, Chee Wei Tan, Kwok-Yan Lam

    Abstract: Federated learning facilitates collaborative machine learning, enabling multiple participants to collectively develop a shared model while preserving the privacy of individual data. The growing importance of the "right to be forgotten" calls for effective mechanisms to facilitate data removal upon request. In response, federated unlearning (FU) has been developed to efficiently eliminate the influ… ▽ More

    Submitted 17 November, 2024; originally announced November 2024.

  37. arXiv:2411.02115  [pdf, other

    cs.LG cs.DC

    FedMoE-DA: Federated Mixture of Experts via Domain Aware Fine-grained Aggregation

    Authors: Ziwei Zhan, Wenkuan Zhao, Yuanqing Li, Weijie Liu, Xiaoxi Zhang, Chee Wei Tan, Chuan Wu, Deke Guo, Xu Chen

    Abstract: Federated learning (FL) is a collaborative machine learning approach that enables multiple clients to train models without sharing their private data. With the rise of deep learning, large-scale models have garnered significant attention due to their exceptional performance. However, a key challenge in FL is the limitation imposed by clients with constrained computational and communication resourc… ▽ More

    Submitted 27 December, 2024; v1 submitted 4 November, 2024; originally announced November 2024.

    Comments: 8 pages, 5 figures, accepted by The 20th International Conference on Mobility, Sensing and Networking (MSN 2024)

  38. FedReMa: Improving Personalized Federated Learning via Leveraging the Most Relevant Clients

    Authors: Han Liang, Ziwei Zhan, Weijie Liu, Xiaoxi Zhang, Chee Wei Tan, Xu Chen

    Abstract: Federated Learning (FL) is a distributed machine learning paradigm that achieves a globally robust model through decentralized computation and periodic model synthesis, primarily focusing on the global model's accuracy over aggregated datasets of all participating clients. Personalized Federated Learning (PFL) instead tailors exclusive models for each client, aiming to enhance the accuracy of clie… ▽ More

    Submitted 26 November, 2024; v1 submitted 4 November, 2024; originally announced November 2024.

    Comments: 8 pages, 4 figures, accepted by European Conference on Artificial Intelligence (2024 ECAI)

    Journal ref: In ECAI 2024 (pp. 2090-2097). IOS Press (2024)

  39. arXiv:2410.15285  [pdf, other

    cs.AI

    Contextual Augmented Multi-Model Programming (CAMP): A Hybrid Local-Cloud Copilot Framework

    Authors: Yuchen Wang, Shangxin Guo, Chee Wei Tan

    Abstract: The advancements in cloud-based Large Languages Models (LLMs) have revolutionized AI-assisted programming. However, their integration into certain local development environments like ones within the Apple software ecosystem (e.g., iOS apps, macOS) remains challenging due to computational demands and sandboxed constraints. This paper presents CAMP, a multi-model AI-assisted programming framework th… ▽ More

    Submitted 5 April, 2025; v1 submitted 20 October, 2024; originally announced October 2024.

    Comments: This work is accepted to IEEE CAI2025

  40. arXiv:2410.08068  [pdf, other

    cs.CL cs.AI

    Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models

    Authors: Wenting Tan, Dongxiao Chen, Jieting Xue, Zihao Wang, Taijie Chen

    Abstract: Large Language Models (LLMs) exhibit impressive performance across various domains but still struggle with arithmetic reasoning tasks. Recent work shows the effectiveness of prompt design methods in enhancing reasoning capabilities. However, these approaches overlook crucial requirements for prior knowledge of specific concepts, theorems, and tricks to tackle most arithmetic reasoning problems suc… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

  41. arXiv:2410.07830  [pdf, ps, other

    cs.CL

    NusaMT-7B: Machine Translation for Low-Resource Indonesian Languages with Large Language Models

    Authors: William Tan, Kevin Zhu

    Abstract: Large Language Models (LLMs) have demonstrated exceptional promise in translation tasks for high-resource languages. However, their performance in low-resource languages is limited by the scarcity of both parallel and monolingual corpora, as well as the presence of noise. Consequently, such LLMs suffer with alignment and have lagged behind State-of-The-Art (SoTA) neural machine translation (NMT) m… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

    Comments: Accepted to SoLaR @ NeurIPS 2024

  42. arXiv:2410.04579  [pdf, other

    cs.CL cs.LG stat.ML

    Upsample or Upweight? Balanced Training on Heavily Imbalanced Datasets

    Authors: Tianjian Li, Haoran Xu, Weiting Tan, Kenton Murray, Daniel Khashabi

    Abstract: Data abundance across different domains exhibits a long-tailed distribution: few domains have abundant data, while most face data scarcity. Our work focuses on a multilingual setting, where available data is heavily skewed towards high-resource languages. Two common strategies to address this disparity are upsampling low-resource data (Temperature Sampling) and upweighting low-resource loss (Scala… ▽ More

    Submitted 9 March, 2025; v1 submitted 6 October, 2024; originally announced October 2024.

    Comments: 19 pages, 9 figures, accepted to NAACL 2025 main conference

  43. arXiv:2410.00168  [pdf, other

    cs.CL cs.SD eess.AS

    SSR: Alignment-Aware Modality Connector for Speech Language Models

    Authors: Weiting Tan, Hirofumi Inaguma, Ning Dong, Paden Tomasello, Xutai Ma

    Abstract: Fusing speech into pre-trained language model (SpeechLM) usually suffers from inefficient encoding of long-form speech and catastrophic forgetting of pre-trained text modality. We propose SSR-Connector (Segmented Speech Representation Connector) for better modality fusion. Leveraging speech-text alignments, our approach segments and compresses speech features to match the granularity of text embed… ▽ More

    Submitted 17 May, 2025; v1 submitted 30 September, 2024; originally announced October 2024.

    Comments: IWSLT 2025

  44. arXiv:2409.15574  [pdf, other

    cs.CV

    Clinical-grade Multi-Organ Pathology Report Generation for Multi-scale Whole Slide Images via a Semantically Guided Medical Text Foundation Model

    Authors: Jing Wei Tan, SeungKyu Kim, Eunsu Kim, Sung Hak Lee, Sangjeong Ahn, Won-Ki Jeong

    Abstract: Vision language models (VLM) have achieved success in both natural language comprehension and image recognition tasks. However, their use in pathology report generation for whole slide images (WSIs) is still limited due to the huge size of multi-scale WSIs and the high cost of WSI annotation. Moreover, in most of the existing research on pathology report generation, sufficient validation regarding… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

  45. arXiv:2409.14801  [pdf, other

    cs.CL

    MTP: A Dataset for Multi-Modal Turning Points in Casual Conversations

    Authors: Gia-Bao Dinh Ho, Chang Wei Tan, Zahra Zamanzadeh Darban, Mahsa Salehi, Gholamreza Haffari, Wray Buntine

    Abstract: Detecting critical moments, such as emotional outbursts or changes in decisions during conversations, is crucial for understanding shifts in human behavior and their consequences. Our work introduces a novel problem setting focusing on these moments as turning points (TPs), accompanied by a meticulously curated, high-consensus, human-annotated multi-modal dataset. We provide precise timestamps, de… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

    Comments: Accepted by ACL 2024 main conference

  46. arXiv:2409.13312  [pdf, other

    cs.CL cs.AI

    GAProtoNet: A Multi-head Graph Attention-based Prototypical Network for Interpretable Text Classification

    Authors: Ximing Wen, Wenjuan Tan, Rosina O. Weber

    Abstract: Pretrained transformer-based Language Models (LMs) are well-known for their ability to achieve significant improvement on text classification tasks with their powerful word embeddings, but their black-box nature, which leads to a lack of interpretability, has been a major concern. In this work, we introduce GAProtoNet, a novel white-box Multi-head Graph Attention-based Prototypical Network designe… ▽ More

    Submitted 19 December, 2024; v1 submitted 20 September, 2024; originally announced September 2024.

    Comments: 8 pages, 5 figues, accepted by COLING 2025

  47. arXiv:2409.13216  [pdf, other

    cs.SD eess.AS

    MuCodec: Ultra Low-Bitrate Music Codec

    Authors: Yaoxun Xu, Hangting Chen, Jianwei Yu, Wei Tan, Rongzhi Gu, Shun Lei, Zhiwei Lin, Zhiyong Wu

    Abstract: Music codecs are a vital aspect of audio codec research, and ultra low-bitrate compression holds significant importance for music transmission and generation. Due to the complexity of music backgrounds and the richness of vocals, solely relying on modeling semantic or acoustic information cannot effectively reconstruct music with both vocals and backgrounds. To address this issue, we propose MuCod… ▽ More

    Submitted 28 September, 2024; v1 submitted 20 September, 2024; originally announced September 2024.

  48. arXiv:2409.12964  [pdf, ps, other

    cs.IT cs.AI

    OpenRANet: Neuralized Spectrum Access by Joint Subcarrier and Power Allocation with Optimization-based Deep Learning

    Authors: Siya Chen, Chee Wei Tan, Xiangping Zhai, H. Vincent Poor

    Abstract: The next-generation radio access network (RAN), known as Open RAN, is poised to feature an AI-native interface for wireless cellular networks, including emerging satellite-terrestrial systems, making deep learning integral to its operation. In this paper, we address the nonconvex optimization challenge of joint subcarrier and power allocation in Open RAN, with the objective of minimizing the total… ▽ More

    Submitted 10 February, 2025; v1 submitted 31 August, 2024; originally announced September 2024.

    Comments: This paper has been accepted by the IEEE Transactions on Green Communications and Networking

  49. FacialFlowNet: Advancing Facial Optical Flow Estimation with a Diverse Dataset and a Decomposed Model

    Authors: Jianzhi Lu, Ruian He, Shili Zhou, Weimin Tan, Bo Yan

    Abstract: Facial movements play a crucial role in conveying altitude and intentions, and facial optical flow provides a dynamic and detailed representation of it. However, the scarcity of datasets and a modern baseline hinders the progress in facial optical flow research. This paper proposes FacialFlowNet (FFN), a novel large-scale facial optical flow dataset, and the Decomposed Facial Flow Model (DecFlow),… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: ACMMM2024

  50. arXiv:2409.03270  [pdf, other

    cs.CV

    SVP: Style-Enhanced Vivid Portrait Talking Head Diffusion Model

    Authors: Weipeng Tan, Chuming Lin, Chengming Xu, Xiaozhong Ji, Junwei Zhu, Chengjie Wang, Yunsheng Wu, Yanwei Fu

    Abstract: Talking Head Generation (THG), typically driven by audio, is an important and challenging task with broad application prospects in various fields such as digital humans, film production, and virtual reality. While diffusion model-based THG methods present high quality and stable content generation, they often overlook the intrinsic style which encompasses personalized features such as speaking hab… ▽ More

    Submitted 28 November, 2024; v1 submitted 5 September, 2024; originally announced September 2024.