Skip to main content

Showing 1–50 of 615 results for author: Dong, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.09655  [pdf, other

    cs.CL

    DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models

    Authors: Xiwen Chen, Wenhui Zhu, Peijie Qiu, Xuanzhao Dong, Hao Wang, Haiyu Wu, Huayu Li, Aristeidis Sotiras, Yalin Wang, Abolfazl Razi

    Abstract: Recent advances in reinforcement learning for language model post-training, such as Group Relative Policy Optimization (GRPO), have shown promise in low-resource settings. However, GRPO typically relies on solution-level and scalar reward signals that fail to capture the semantic diversity among sampled completions. This leads to what we identify as a diversity-quality inconsistency, where distinc… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  2. arXiv:2505.02179  [pdf, other

    cs.CV

    ProDisc-VAD: An Efficient System for Weakly-Supervised Anomaly Detection in Video Surveillance Applications

    Authors: Tao Zhu, Qi Yu, Xinru Dong, Shiyu Li, Yue Liu, Jinlong Jiang, Lei Shu

    Abstract: Weakly-supervised video anomaly detection (WS-VAD) using Multiple Instance Learning (MIL) suffers from label ambiguity, hindering discriminative feature learning. We propose ProDisc-VAD, an efficient framework tackling this via two synergistic components. The Prototype Interaction Layer (PIL) provides controlled normality modeling using a small set of learnable prototypes, establishing a robust ba… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

  3. arXiv:2505.02018  [pdf, ps, other

    cs.CV

    R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation

    Authors: Meng-Hao Guo, Jiajun Xu, Yi Zhang, Jiaxi Song, Haoyang Peng, Yi-Xuan Deng, Xinzhi Dong, Kiyohiro Nakayama, Zhengyang Geng, Chen Wang, Bolin Ni, Guo-Wei Yang, Yongming Rao, Houwen Peng, Han Hu, Gordon Wetzstein, Shi-min Hu

    Abstract: Reasoning stands as a cornerstone of intelligence, enabling the synthesis of existing knowledge to solve complex problems. Despite remarkable progress, existing reasoning benchmarks often fail to rigorously evaluate the nuanced reasoning capabilities required for complex, real-world problemsolving, particularly in multi-disciplinary and multimodal contexts. In this paper, we introduce a graduate-l… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

    Comments: 18pages

  4. arXiv:2505.00236  [pdf

    cs.LG

    Node2Vec-DGI-EL: A Hierarchical Graph Representation Learning Model for Ingredient-Disease Association Prediction

    Authors: Leifeng Zhang, Xin Dong, Shuaibing Jia, Jianhua Zhang

    Abstract: Traditional Chinese medicine, as an essential component of traditional medicine, contains active ingredients that serve as a crucial source for modern drug development, holding immense therapeutic potential and development value. A multi-layered and complex network is formed from Chinese medicine to diseases and used to predict the potential associations between Chinese medicine ingredients and di… ▽ More

    Submitted 30 April, 2025; originally announced May 2025.

  5. arXiv:2504.21252  [pdf, other

    cs.CL

    Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA

    Authors: Xuanzhao Dong, Wenhui Zhu, Hao Wang, Xiwen Chen, Peijie Qiu, Rui Yin, Yi Su, Yalin Wang

    Abstract: Medical question answering (QA) is a reasoning-intensive task that remains challenging for large language models (LLMs) due to hallucinations and outdated domain knowledge. Retrieval-Augmented Generation (RAG) provides a promising post-training solution by leveraging external knowledge. However, existing medical RAG systems suffer from two key limitations: (1) a lack of modeling for human-like rea… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  6. arXiv:2504.21064  [pdf, other

    cs.LG cs.AI

    Frequency Feature Fusion Graph Network For Depression Diagnosis Via fNIRS

    Authors: Chengkai Yang, Xingping Dong, Xiaofen Zong

    Abstract: Data-driven approaches for depression diagnosis have emerged as a significant research focus in neuromedicine, driven by the development of relevant datasets. Recently, graph neural network (GNN)-based models have gained widespread adoption due to their ability to capture brain channel functional connectivity from both spatial and temporal perspectives. However, their effectiveness is hindered by… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  7. arXiv:2504.21063  [pdf, other

    cs.LG cs.AI

    Token-Level Prompt Mixture with Parameter-Free Routing for Federated Domain Generalization

    Authors: Shuai Gong, Chaoran Cui, Xiaolin Dong, Xiushan Nie, Lei Zhu, Xiaojun Chang

    Abstract: Federated domain generalization (FedDG) aims to learn a globally generalizable model from decentralized clients with heterogeneous data while preserving privacy. Recent studies have introduced prompt learning to adapt vision-language models (VLMs) in FedDG by learning a single global prompt. However, such a one-prompt-fits-all learning paradigm typically leads to performance degradation on persona… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: The manuscript has been submitted to IEEE Transactions on Knowledge and Data Engineering

  8. arXiv:2504.20319  [pdf, other

    cs.LG

    Bayesian Experimental Design for Model Discrepancy Calibration: An Auto-Differentiable Ensemble Kalman Inversion Approach

    Authors: Huchen Yang, Xinghao Dong, Jin-Long Wu

    Abstract: Bayesian experimental design (BED) offers a principled framework for optimizing data acquisition by leveraging probabilistic inference. However, practical implementations of BED are often compromised by model discrepancy, i.e., the mismatch between predictive models and true physical systems, which can potentially lead to biased parameter estimates. While data-driven approaches have been recently… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  9. arXiv:2504.18201  [pdf, other

    cs.CV cs.AI

    Multi-Grained Compositional Visual Clue Learning for Image Intent Recognition

    Authors: Yin Tang, Jiankai Li, Hongyu Yang, Xuan Dong, Lifeng Fan, Weixin Li

    Abstract: In an era where social media platforms abound, individuals frequently share images that offer insights into their intents and interests, impacting individual life quality and societal stability. Traditional computer vision tasks, such as object detection and semantic segmentation, focus on concrete visual representations, while intent recognition relies more on implicit visual clues. This poses ch… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  10. arXiv:2504.16524  [pdf, other

    cs.IR

    Modality Reliability Guided Multimodal Recommendation

    Authors: Xue Dong, Xuemeng Song, Na Zheng, Sicheng Zhao, Guiguang Ding

    Abstract: Multimodal recommendation faces an issue of the performance degradation that the uni-modal recommendation sometimes achieves the better performance. A possible reason is that the unreliable item modality data hurts the fusion result. Several existing studies have introduced weights for different modalities to reduce the contribution of the unreliable modality data in predicting the final user rati… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  11. arXiv:2504.16096  [pdf, other

    q-bio.NC cs.AI cs.CV

    BrainPrompt: Multi-Level Brain Prompt Enhancement for Neurological Condition Identification

    Authors: Jiaxing Xu, Kai He, Yue Tang, Wei Li, Mengcheng Lan, Xia Dong, Yiping Ke, Mengling Feng

    Abstract: Neurological conditions, such as Alzheimer's Disease, are challenging to diagnose, particularly in the early stages where symptoms closely resemble healthy controls. Existing brain network analysis methods primarily focus on graph-based models that rely solely on imaging data, which may overlook important non-imaging factors and limit the model's predictive power and interpretability. In this pape… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  12. arXiv:2504.16053  [pdf, other

    cs.CL cs.AI

    LongMamba: Enhancing Mamba's Long Context Capabilities via Training-Free Receptive Field Enlargement

    Authors: Zhifan Ye, Kejing Xia, Yonggan Fu, Xin Dong, Jihoon Hong, Xiangchi Yuan, Shizhe Diao, Jan Kautz, Pavlo Molchanov, Yingyan Celine Lin

    Abstract: State space models (SSMs) have emerged as an efficient alternative to Transformer models for language modeling, offering linear computational complexity and constant memory usage as context length increases. However, despite their efficiency in handling long contexts, recent studies have shown that SSMs, such as Mamba models, generally underperform compared to Transformers in long-context understa… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: Accepted by ICLR 2025

  13. arXiv:2504.13904  [pdf, other

    cs.HC cs.AI cs.CL

    Generative Framework for Personalized Persuasion: Inferring Causal, Counterfactual, and Latent Knowledge

    Authors: Donghuo Zeng, Roberto Legaspi, Yuewen Sun, Xinshuai Dong, Kazushi Ikeda, Peter Spirtes, Kun Zhang

    Abstract: We hypothesize that optimal system responses emerge from adaptive strategies grounded in causal and counterfactual knowledge. Counterfactual inference allows us to create hypothetical scenarios to examine the effects of alternative system responses. We enhance this process through causal discovery, which identifies the strategies informed by the underlying causal structure that govern system behav… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

    Comments: 12 pages, 10 figures, 1 table. Accepted by ACM UMAP 2025

  14. arXiv:2504.13161  [pdf, other

    cs.CL

    CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

    Authors: Shizhe Diao, Yu Yang, Yonggan Fu, Xin Dong, Dan Su, Markus Kliegl, Zijia Chen, Peter Belcak, Yoshi Suhara, Hongxu Yin, Mostofa Patwary, Yingyan, Lin, Jan Kautz, Pavlo Molchanov

    Abstract: Pre-training datasets are typically collected from web content and lack inherent domain divisions. For instance, widely used datasets like Common Crawl do not include explicit domain labels, while manually curating labeled datasets such as The Pile is labor-intensive. Consequently, identifying an optimal pre-training data mixture remains a challenging problem, despite its significant benefits for… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: 20 pages, 9 figures

  15. arXiv:2504.13131  [pdf, other

    eess.IV cs.AI cs.CV

    NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results

    Authors: Xin Li, Kun Yuan, Bingchen Li, Fengbin Guan, Yizhen Shao, Zihao Yu, Xijun Wang, Yiting Lu, Wei Luo, Suhang Yao, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Yabin Zhang, Ao-Xiang Zhang, Tianwu Zhi, Jianzhao Liu, Yang Li, Jingwen Xu, Yiting Liao, Yushen Zuo, Mingyang Wu, Renjie Li, Shengyun Zhong , et al. (88 additional authors not shown)

    Abstract: This paper presents a review for the NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement. The challenge comprises two tracks: (i) Efficient Video Quality Assessment (KVQ), and (ii) Diffusion-based Image Super-Resolution (KwaiSR). Track 1 aims to advance the development of lightweight and efficient video quality assessment (VQA) models, with an emphasis on eliminating re… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Challenge Report of NTIRE 2025; Methods from 18 Teams; Accepted by CVPR Workshop; 21 pages

  16. arXiv:2504.12959  [pdf, other

    cs.CV

    Rethinking Temporal Fusion with a Unified Gradient Descent View for 3D Semantic Occupancy Prediction

    Authors: Dubing Chen, Huan Zheng, Jin Fang, Xingping Dong, Xianfei Li, Wenlong Liao, Tao He, Pai Peng, Jianbing Shen

    Abstract: We present GDFusion, a temporal fusion method for vision-based 3D semantic occupancy prediction (VisionOcc). GDFusion opens up the underexplored aspects of temporal fusion within the VisionOcc framework, focusing on both temporal cues and fusion strategies. It systematically examines the entire VisionOcc pipeline, identifying three fundamental yet previously overlooked temporal cues: scene-level c… ▽ More

    Submitted 18 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

    Comments: CVPR 2025

  17. arXiv:2504.12545  [pdf, other

    cs.CY cs.AI cs.CL

    Knowledge Acquisition on Mass-shooting Events via LLMs for AI-Driven Justice

    Authors: Benign John Ihugba, Afsana Nasrin, Ling Wu, Lin Li, Lijun Qian, Xishuang Dong

    Abstract: Mass-shooting events pose a significant challenge to public safety, generating large volumes of unstructured textual data that hinder effective investigations and the formulation of public policy. Despite the urgency, few prior studies have effectively automated the extraction of key information from these events to support legal and investigative efforts. This paper presented the first dataset de… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  18. arXiv:2504.11431  [pdf, other

    cs.CL cs.AI cs.CY cs.LG cs.SI

    Masculine Defaults via Gendered Discourse in Podcasts and Large Language Models

    Authors: Maria Teleki, Xiangjue Dong, Haoran Liu, James Caverlee

    Abstract: Masculine defaults are widely recognized as a significant type of gender bias, but they are often unseen as they are under-researched. Masculine defaults involve three key parts: (i) the cultural context, (ii) the masculine characteristics or behaviors, and (iii) the reward for, or simply acceptance of, those masculine characteristics or behaviors. In this work, we study discourse-based masculine… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: To appear in ICWSM 2025

  19. arXiv:2504.11349  [pdf, other

    cs.CV cs.AI cs.GR

    Explicit and Implicit Representations in AI-based 3D Reconstruction for Radiology: A systematic literature review

    Authors: Yuezhe Yang, Boyu Yang, Yaqian Wang, Yang He, Xingbo Dong, Zhe Jin

    Abstract: The demand for high-quality medical imaging in clinical practice and assisted diagnosis has made 3D reconstruction in radiological imaging a key research focus. Artificial intelligence (AI) has emerged as a promising approach to enhancing reconstruction accuracy while reducing acquisition and processing time, thereby minimizing patient radiation exposure and discomfort and ultimately benefiting cl… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: 43 pages, 5 figures, submit to Medical Image Analysis

    MSC Class: 68T45 ACM Class: I.4.5

  20. arXiv:2504.11259  [pdf, ps, other

    cs.DB

    The Cambridge Report on Database Research

    Authors: Anastasia Ailamaki, Samuel Madden, Daniel Abadi, Gustavo Alonso, Sihem Amer-Yahia, Magdalena Balazinska, Philip A. Bernstein, Peter Boncz, Michael Cafarella, Surajit Chaudhuri, Susan Davidson, David DeWitt, Yanlei Diao, Xin Luna Dong, Michael Franklin, Juliana Freire, Johannes Gehrke, Alon Halevy, Joseph M. Hellerstein, Mark D. Hill, Stratos Idreos, Yannis Ioannidis, Christoph Koch, Donald Kossmann, Tim Kraska , et al. (21 additional authors not shown)

    Abstract: On October 19 and 20, 2023, the authors of this report convened in Cambridge, MA, to discuss the state of the database research field, its recent accomplishments and ongoing challenges, and future directions for research and community engagement. This gathering continues a long standing tradition in the database community, dating back to the late 1980s, in which researchers meet roughly every five… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  21. arXiv:2504.10685  [pdf, other

    cs.CV cs.AI

    NTIRE 2025 Challenge on Cross-Domain Few-Shot Object Detection: Methods and Results

    Authors: Yuqian Fu, Xingyu Qiu, Bin Ren, Yanwei Fu, Radu Timofte, Nicu Sebe, Ming-Hsuan Yang, Luc Van Gool, Kaijin Zhang, Qingpeng Nong, Xiugang Dong, Hong Gao, Xiangsheng Zhou, Jiancheng Pan, Yanxing Liu, Xiao He, Jiahao Li, Yuze Sun, Xiaomeng Huang, Zhenyu Zhang, Ran Ma, Yuhan Liu, Zijian Zhuang, Shuai Yi, Yixiong Zou , et al. (37 additional authors not shown)

    Abstract: Cross-Domain Few-Shot Object Detection (CD-FSOD) poses significant challenges to existing object detection and few-shot detection models when applied across domains. In conjunction with NTIRE 2025, we organized the 1st CD-FSOD Challenge, aiming to advance the performance of current object detectors on entirely novel target domains with only limited labeled data. The challenge attracted 152 registe… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: accepted by CVPRW 25 @ NTIRE

  22. arXiv:2504.07957  [pdf, other

    cs.CV

    MM-IFEngine: Towards Multimodal Instruction Following

    Authors: Shengyuan Ding, Shenxi Wu, Xiangyu Zhao, Yuhang Zang, Haodong Duan, Xiaoyi Dong, Pan Zhang, Yuhang Cao, Dahua Lin, Jiaqi Wang

    Abstract: The Instruction Following (IF) ability measures how well Multi-modal Large Language Models (MLLMs) understand exactly what users are telling them and whether they are doing it right. Existing multimodal instruction following training data is scarce, the benchmarks are simple with atomic instructions, and the evaluation strategies are imprecise for tasks demanding exact output constraints. To addre… ▽ More

    Submitted 27 April, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

  23. arXiv:2504.06232  [pdf, other

    cs.CV

    HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance

    Authors: Jiazi Bu, Pengyang Ling, Yujie Zhou, Pan Zhang, Tong Wu, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang

    Abstract: Text-to-image (T2I) diffusion/flow models have drawn considerable attention recently due to their remarkable ability to deliver flexible visual creations. Still, high-resolution image synthesis presents formidable challenges due to the scarcity and complexity of high-resolution content. To this end, we present HiFlow, a training-free and model-agnostic framework to unlock the resolution potential… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

  24. arXiv:2504.05351  [pdf

    cs.RO

    Development and Experimental Evaluation of a Vibration-Based Adhesion System for Miniature Wall-Climbing Robots

    Authors: Siqian Li, Jung-Che Chang, Xi Wang, Xin Dong

    Abstract: In recent years, miniature wall-climbing robots have attracted widespread attention due to their significant potential in equipment inspection and in-situ repair applications. Traditional wall-climbing systems typically rely on electromagnetic, electrostatic, vacuum suction, or van der Waals forces for controllable adhesion. However, these conventional methods impose limitations when striving for… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

  25. arXiv:2504.05344  [pdf

    cs.LG cs.AI

    Divergent Paths: Separating Homophilic and Heterophilic Learning for Enhanced Graph-level Representations

    Authors: Han Lei, Jiaxing Xu, Xia Dong, Yiping Ke

    Abstract: Graph Convolutional Networks (GCNs) are predominantly tailored for graphs displaying homophily, where similar nodes connect, but often fail on heterophilic graphs. The strategy of adopting distinct approaches to learn from homophilic and heterophilic components in node-level tasks has been widely discussed and proven effective both theoretically and experimentally. However, in graph-level tasks, r… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

    Comments: 10 pages, 6 figures

  26. arXiv:2504.03773  [pdf, other

    cs.LG cs.AI

    SHapley Estimated Explanation (SHEP): A Fast Post-Hoc Attribution Method for Interpreting Intelligent Fault Diagnosis

    Authors: Qian Chen, Xingjian Dong, Zhike Peng, Guang Meng

    Abstract: Despite significant progress in intelligent fault diagnosis (IFD), the lack of interpretability remains a critical barrier to practical industrial applications, driving the growth of interpretability research in IFD. Post-hoc interpretability has gained popularity due to its ability to preserve network flexibility and scalability without modifying model structures. However, these methods often yie… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: 16 pages, 21 figures

  27. arXiv:2504.01004  [pdf, other

    cs.CV

    Schrödinger Diffusion Driven Signal Recovery in 3T BOLD fMRI Using Unmatched 7T Observations

    Authors: Yujian Xiong, Xuanzhao Dong, Sebastian Waz, Wenhui Zhu, Negar Mallak, Zhong-lin Lu, Yalin Wang

    Abstract: Ultra-high-field (7 Tesla) BOLD fMRI offers exceptional detail in both spatial and temporal domains, along with robust signal-to-noise characteristics, making it a powerful modality for studying visual information processing in the brain. However, due to the limited accessibility of 7T scanners, the majority of neuroimaging studies are still conducted using 3T systems, which inherently suffer from… ▽ More

    Submitted 13 May, 2025; v1 submitted 1 April, 2025; originally announced April 2025.

  28. arXiv:2503.16544  [pdf, other

    cs.CL cs.AI cs.HC

    Causal Discovery and Counterfactual Reasoning to Optimize Persuasive Dialogue Policies

    Authors: Donghuo Zeng, Roberto Legaspi, Yuewen Sun, Xinshuai Dong, Kazushi Ikeda, Peter Spirtes, Kun Zhang

    Abstract: Tailoring persuasive conversations to users leads to more effective persuasion. However, existing dialogue systems often struggle to adapt to dynamically evolving user states. This paper presents a novel method that leverages causal discovery and counterfactual reasoning for optimizing system persuasion capability and outcomes. We employ the Greedy Relaxation of the Sparsest Permutation (GRaSP) al… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

    Comments: 21 pages, 8 figures

  29. arXiv:2503.13503  [pdf, other

    cs.LG cs.CL cs.DL cs.IR

    SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models

    Authors: Chuan Qin, Xin Chen, Chengrui Wang, Pengmin Wu, Xi Chen, Yihang Cheng, Jingyi Zhao, Meng Xiao, Xiangchao Dong, Qingqing Long, Boya Pan, Han Wu, Chengzan Li, Yuanchun Zhou, Hui Xiong, Hengshu Zhu

    Abstract: In recent years, the rapid advancement of Artificial Intelligence (AI) technologies, particularly Large Language Models (LLMs), has revolutionized the paradigm of scientific discovery, establishing AI-for-Science (AI4Science) as a dynamic and evolving field. However, there is still a lack of an effective framework for the overall assessment of AI4Science, particularly from a holistic perspective o… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  30. arXiv:2503.13322  [pdf

    cs.LG

    SMPR: A structure-enhanced multimodal drug-disease prediction model for drug repositioning and cold start

    Authors: Xin Dong, Rui Miao, Suyan Zhang, Shuaibing Jia, Leifeng Zhang, Yong Liang, Jianhua Zhang, Yi Zhun Zhu

    Abstract: Repositioning drug-disease relationships has always been a hot field of research. However, actual cases of biologically validated drug relocation remain very limited, and existing models have not yet fully utilized the structural information of the drug. Furthermore, most repositioning models are only used to complete the relationship matrix, and their practicality is poor when dealing with drug c… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  31. arXiv:2503.13269  [pdf, other

    cs.DB

    DAgent: A Relational Database-Driven Data Analysis Report Generation Agent

    Authors: Wenyi Xu, Yuren Mao, Xiaolu Zhang, Chao Zhang, Xuemei Dong, Mengfei Zhang, Yunjun Gao

    Abstract: Relational database-driven data analysis (RDB-DA) report generation, which aims to generate data analysis reports after querying relational databases, has been widely applied in fields such as finance and healthcare. Typically, these tasks are manually completed by data scientists, making the process very labor-intensive and showing a clear need for automation. Although existing methods (e.g., Tab… ▽ More

    Submitted 1 April, 2025; v1 submitted 17 March, 2025; originally announced March 2025.

  32. arXiv:2503.13012  [pdf, other

    cs.CV cs.AI

    Test-Time Domain Generalization via Universe Learning: A Multi-Graph Matching Approach for Medical Image Segmentation

    Authors: Xingguo Lv, Xingbo Dong, Liwen Wang, Jiewen Yang, Lei Zhao, Bin Pu, Zhe Jin, Xuejun Li

    Abstract: Despite domain generalization (DG) has significantly addressed the performance degradation of pre-trained models caused by domain shifts, it often falls short in real-world deployment. Test-time adaptation (TTA), which adjusts a learned model using unlabeled test data, presents a promising solution. However, most existing TTA methods struggle to deliver strong performance in medical image segmenta… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  33. arXiv:2503.11301  [pdf, other

    cs.CL cs.MA

    GNNs as Predictors of Agentic Workflow Performances

    Authors: Yuanshuo Zhang, Yuchen Hou, Bohan Tang, Shuo Chen, Muhan Zhang, Xiaowen Dong, Siheng Chen

    Abstract: Agentic workflows invoked by Large Language Models (LLMs) have achieved remarkable success in handling complex tasks. However, optimizing such workflows is costly and inefficient in real-world applications due to extensive invocations of LLMs. To fill this gap, this position paper formulates agentic workflows as computational graphs and advocates Graph Neural Networks (GNNs) as efficient predictor… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: 15 pages, 11 figures

  34. arXiv:2503.09916  [pdf, other

    cs.LG

    Type Information-Assisted Self-Supervised Knowledge Graph Denoising

    Authors: Jiaqi Sun, Yujia Zheng, Xinshuai Dong, Haoyue Dai, Kun Zhang

    Abstract: Knowledge graphs serve as critical resources supporting intelligent systems, but they can be noisy due to imperfect automatic generation processes. Existing approaches to noise detection often rely on external facts, logical rule constraints, or structural embeddings. These methods are often challenged by imperfect entity alignment, flexible knowledge graph construction, and overfitting on structu… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

    Comments: Accepted by AISTATS 2025

  35. Adaptive Backdoor Attacks with Reasonable Constraints on Graph Neural Networks

    Authors: Xuewen Dong, Jiachen Li, Shujun Li, Zhichao You, Qiang Qu, Yaroslav Kholodov, Yulong Shen

    Abstract: Recent studies show that graph neural networks (GNNs) are vulnerable to backdoor attacks. Existing backdoor attacks against GNNs use fixed-pattern triggers and lack reasonable trigger constraints, overlooking individual graph characteristics and rendering insufficient evasiveness. To tackle the above issues, we propose ABARC, the first Adaptive Backdoor Attack with Reasonable Constraints, applying… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

    Journal ref: IEEE Transactions on Dependable and Secure Computing, 2025

  36. arXiv:2503.09008  [pdf, other

    cs.LG cs.AI

    Towards Quantifying Long-Range Interactions in Graph Machine Learning: a Large Graph Dataset and a Measurement

    Authors: Huidong Liang, Haitz Sáez de Ocáriz Borde, Baskaran Sripathmanathan, Michael Bronstein, Xiaowen Dong

    Abstract: Long-range dependencies are critical for effective graph representation learning, yet most existing datasets focus on small graphs tailored to inductive tasks, offering limited insight into long-range interactions. Current evaluations primarily compare models employing global attention (e.g., graph transformers) with those using local neighborhood aggregation (e.g., message-passing neural networks… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: work in progress

  37. arXiv:2503.08760  [pdf, other

    cs.LG cs.AI stat.ML

    Heterogeneous Graph Structure Learning through the Lens of Data-generating Processes

    Authors: Keyue Jiang, Bohan Tang, Xiaowen Dong, Laura Toni

    Abstract: Inferring the graph structure from observed data is a key task in graph machine learning to capture the intrinsic relationship between data entities. While significant advancements have been made in learning the structure of homogeneous graphs, many real-world graphs exhibit heterogeneous patterns where nodes and edges have multiple types. This paper fills this gap by introducing the first approac… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  38. arXiv:2503.07302  [pdf, other

    cs.LG

    When Selection Meets Intervention: Additional Complexities in Causal Discovery

    Authors: Haoyue Dai, Ignavier Ng, Jianle Sun, Zeyu Tang, Gongxu Luo, Xinshuai Dong, Peter Spirtes, Kun Zhang

    Abstract: We address the common yet often-overlooked selection bias in interventional studies, where subjects are selectively enrolled into experiments. For instance, participants in a drug trial are usually patients of the relevant disease; A/B tests on mobile applications target existing users only, and gene perturbation studies typically focus on specific cell types, such as cancer cells. Ignoring this b… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: Appears at ICLR 2025 (oral)

    Journal ref: Proceedings of the International Conference on Learning Representations (ICLR), 2025

  39. arXiv:2503.03987  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    RetinalGPT: A Retinal Clinical Preference Conversational Assistant Powered by Large Vision-Language Models

    Authors: Wenhui Zhu, Xin Li, Xiwen Chen, Peijie Qiu, Vamsi Krishna Vasa, Xuanzhao Dong, Yanxi Chen, Natasha Lepore, Oana Dumitrascu, Yi Su, Yalin Wang

    Abstract: Recently, Multimodal Large Language Models (MLLMs) have gained significant attention for their remarkable ability to process and analyze non-textual data, such as images, videos, and audio. Notably, several adaptations of general-domain MLLMs to the medical field have been explored, including LLaVA-Med. However, these medical adaptations remain insufficiently advanced in understanding and interpre… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  40. arXiv:2503.03146  [pdf, other

    cs.CR

    PriFFT: Privacy-preserving Federated Fine-tuning of Large Language Models via Hybrid Secret Sharing

    Authors: Zhichao You, Xuewen Dong, Ke Cheng, Xutong Mu, Jiaxuan Fu, Shiyang Ma, Qiang Qu, Yulong Shen

    Abstract: Fine-tuning large language models (LLMs) raises privacy concerns due to the risk of exposing sensitive training data. Federated learning (FL) mitigates this risk by keeping training samples on local devices, while facing the following problems in privacy-preserving federated fine-tuning. (i) Recent studies show that adversaries can still infer private information in FL. (ii) LLM parameters are sha… ▽ More

    Submitted 13 May, 2025; v1 submitted 4 March, 2025; originally announced March 2025.

  41. arXiv:2503.02004  [pdf, other

    eess.SP cs.IT eess.SY

    Group Sparsity Methods for Compressive Space-Frequency Channel Estimation and Spatial Equalization in Fluid Antenna System

    Authors: Xuehui Dong, Kai Wan, Shuangyang Li, Robert Caiming Qiu, Giuseppe Caire

    Abstract: Fluid Antenna System (FAS) unlocks unprecedented flexibility in wireless channel optimization through spatial reconfigurability. However, its practical deployment is hindered by the coupled challenges posed by high-dimensional channel estimation and real-time position optimization. This paper bridges wireless propagation physics with compressed sensing theory to address these challenges through th… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  42. arXiv:2503.01785  [pdf, other

    cs.CV

    Visual-RFT: Visual Reinforcement Fine-Tuning

    Authors: Ziyu Liu, Zeyi Sun, Yuhang Zang, Xiaoyi Dong, Yuhang Cao, Haodong Duan, Dahua Lin, Jiaqi Wang

    Abstract: Reinforcement Fine-Tuning (RFT) in Large Reasoning Models like OpenAI o1 learns from feedback on its answers, which is especially useful in applications when fine-tuning data is scarce. Recent open-source work like DeepSeek-R1 demonstrates that reinforcement learning with verifiable reward is one key direction in reproducing o1. While the R1-style model has demonstrated success in language models,… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: project page: https://github.com/Liuziyu77/Visual-RFT

  43. arXiv:2503.00639  [pdf, other

    cs.LG stat.ML

    Synergy Between Sufficient Changes and Sparse Mixing Procedure for Disentangled Representation Learning

    Authors: Zijian Li, Shunxing Fan, Yujia Zheng, Ignavier Ng, Shaoan Xie, Guangyi Chen, Xinshuai Dong, Ruichu Cai, Kun Zhang

    Abstract: Disentangled representation learning aims to uncover latent variables underlying the observed data, and generally speaking, rather strong assumptions are needed to ensure identifiability. Some approaches rely on sufficient changes on the distribution of latent variables indicated by auxiliary variables such as domain indices, but acquiring enough domains is often challenging. Alternative approache… ▽ More

    Submitted 1 March, 2025; originally announced March 2025.

  44. arXiv:2502.20295  [pdf, other

    cs.LG cs.AI cs.CV

    Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription

    Authors: Benjamin Gutteridge, Matthew Thomas Jackson, Toni Kukurin, Xiaowen Dong

    Abstract: Handwritten text recognition (HTR) remains a challenging task, particularly for multi-page documents where pages share common formatting and contextual features. While modern optical character recognition (OCR) engines are proficient with printed text, their performance on handwriting is limited, often requiring costly labeled data for fine-tuning. In this paper, we explore the use of multi-modal… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    Comments: 11 pages (including references and appendix), 14 figures, accepted at AAAI-25 Workshop on Document Understanding and Intelligence, non-archival

  45. arXiv:2502.19391  [pdf, other

    q-bio.BM cs.LG

    Towards More Accurate Full-Atom Antibody Co-Design

    Authors: Jiayang Wu, Xingyi Zhang, Xiangyu Dong, Kun Xie, Ziqi Liu, Wensheng Gan, Sibo Wang, Le Song

    Abstract: Antibody co-design represents a critical frontier in drug development, where accurate prediction of both 1D sequence and 3D structure of complementarity-determining regions (CDRs) is essential for targeting specific epitopes. Despite recent advances in equivariant graph neural networks for antibody design, current approaches often fall short in capturing the intricate interactions that govern anti… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

  46. arXiv:2502.19233  [pdf, other

    cs.AR

    FPGA-based Emulation and Device-Side Management for CXL-based Memory Tiering Systems

    Authors: Yiqi Chen, Xiping Dong, Zhe Zhou, Zhao Wang, Jie Zhang, Guangyu Sun

    Abstract: The Compute Express Link (CXL) technology facilitates the extension of CPU memory through byte-addressable SerDes links and cascaded switches, creating complex heterogeneous memory systems where CPU access to various endpoints differs in latency and bandwidth. Effective tiered memory management is essential for optimizing system performance in such systems. However, designing an effective memory t… ▽ More

    Submitted 14 March, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

  47. arXiv:2502.18309  [pdf, other

    cs.GR cs.CV cs.SD eess.AS

    GCDance: Genre-Controlled 3D Full Body Dance Generation Driven By Music

    Authors: Xinran Liu, Xu Dong, Diptesh Kanojia, Wenwu Wang, Zhenhua Feng

    Abstract: Generating high-quality full-body dance sequences from music is a challenging task as it requires strict adherence to genre-specific choreography. Moreover, the generated sequences must be both physically realistic and precisely synchronized with the beats and rhythm of the music. To overcome these challenges, we propose GCDance, a classifier-free diffusion framework for generating genre-specific… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

  48. arXiv:2502.15130  [pdf, other

    cs.CV

    TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba

    Authors: Xiuwei Chen, Sihao Lin, Xiao Dong, Zisheng Chen, Meng Cao, Jianhua Han, Hang Xu, Xiaodan Liang

    Abstract: Transformers have been favored in both uni-modal and multi-modal foundation models for their flexible scalability in attention modules. Consequently, a number of pre-trained Transformer models, e.g., LLaVA, CLIP, and DEIT, are publicly available. Recent research has introduced subquadratic architectures like Mamba, which enables global awareness with linear complexity. Nevertheless, training speci… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

  49. arXiv:2502.14260  [pdf, other

    eess.IV cs.AI cs.CV

    EyeBench: A Call for More Rigorous Evaluation of Retinal Image Enhancement

    Authors: Wenhui Zhu, Xuanzhao Dong, Xin Li, Yujian Xiong, Xiwen Chen, Peijie Qiu, Vamsi Krishna Vasa, Zhangsihao Yang, Yi Su, Oana Dumitrascu, Yalin Wang

    Abstract: Over the past decade, generative models have achieved significant success in enhancement fundus images.However, the evaluation of these models still presents a considerable challenge. A comprehensive evaluation benchmark for fundus image enhancement is indispensable for three main reasons: 1) The existing denoising metrics (e.g., PSNR, SSIM) are hardly to extend to downstream real-world clinical r… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

  50. arXiv:2502.13128  [pdf, other

    cs.SD cs.AI

    SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

    Authors: Zihan Liu, Shuangrui Ding, Zhixiong Zhang, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang

    Abstract: Text-to-song generation, the task of creating vocals and accompaniment from textual inputs, poses significant challenges due to domain complexity and data scarcity. Existing approaches often employ multi-stage generation procedures, resulting in cumbersome training and inference pipelines. In this paper, we propose SongGen, a fully open-source, single-stage auto-regressive transformer designed for… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.