Skip to main content

Showing 1–50 of 1,209 results for author: Jiang, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10063  [pdf, ps, other

    cs.CL

    CAFE: Retrieval Head-based Coarse-to-Fine Information Seeking to Enhance Multi-Document QA Capability

    Authors: Han Peng, Jinhao Jiang, Zican Dong, Wayne Xin Zhao, Lei Fang

    Abstract: Advancements in Large Language Models (LLMs) have extended their input context length, yet they still struggle with retrieval and reasoning in long-context inputs. Existing methods propose to utilize the prompt strategy and retrieval head to alleviate this limitation. However, they still face challenges in balancing retrieval precision and recall, impacting their efficacy in answering questions. T… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2505.09730  [pdf, ps, other

    quant-ph cs.DS

    High-Temperature Fermionic Gibbs States are Mixtures of Gaussian States

    Authors: Akshar Ramkumar, Yiyi Cai, Yu Tong, Jiaqing Jiang

    Abstract: Efficient simulation of a quantum system generally relies on structural properties of the quantum state. Motivated by the recent results by Bakshi et al. on the sudden death of entanglement in high-temperature Gibbs states of quantum spin systems, we study the high-temperature Gibbs states of bounded-degree local fermionic Hamiltonians, which include the special case of geometrically local fermion… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 40 pages

  3. arXiv:2505.08672  [pdf, other

    cs.CY

    How Students Use AI Feedback Matters: Experimental Evidence on Physics Achievement and Autonomy

    Authors: Xusheng Dai, Zhaochun Wen, Jianxiao Jiang, Huiqin Liu, Yu Zhang

    Abstract: Despite the precision and adaptiveness of generative AI (GAI)-powered feedback provided to students, existing practice and literature might ignore how usage patterns impact student learning. This study examines the heterogeneous effects of GAI-powered personalized feedback on high school students' physics achievement and autonomy through two randomized controlled trials, with a major focus on usag… ▽ More

    Submitted 15 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

  4. arXiv:2505.08542  [pdf, other

    cs.AI

    Guiding LLM-based Smart Contract Generation with Finite State Machine

    Authors: Hao Luo, Yuhao Lin, Xiao Yan, Xintong Hu, Yuxiang Wang, Qiming Zeng, Hao Wang, Jiawei Jiang

    Abstract: Smart contract is a kind of self-executing code based on blockchain technology with a wide range of application scenarios, but the traditional generation method relies on manual coding and expert auditing, which has a high threshold and low efficiency. Although Large Language Models (LLMs) show great potential in programming tasks, they still face challenges in smart contract generation w.r.t. eff… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  5. arXiv:2505.08168  [pdf, ps, other

    cs.CL cs.AI

    Exploiting Text Semantics for Few and Zero Shot Node Classification on Text-attributed Graph

    Authors: Yuxiang Wang, Xiao Yan, Shiyu Jin, Quanqing Xu, Chuang Hu, Yuanyuan Zhu, Bo Du, Jia Wu, Jiawei Jiang

    Abstract: Text-attributed graph (TAG) provides a text description for each graph node, and few- and zero-shot node classification on TAGs have many applications in fields such as academia and social networks. Existing work utilizes various graph-based augmentation techniques to train the node and text embeddings, while text-based augmentations are largely unexplored. In this paper, we propose Text Semantics… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  6. arXiv:2505.07687  [pdf, ps, other

    eess.IV cs.CV

    ABS-Mamba: SAM2-Driven Bidirectional Spiral Mamba Network for Medical Image Translation

    Authors: Feng Yuan, Yifan Gao, Wenbin Wu, Keqing Wu, Xiaotong Guo, Jie Jiang, Xin Gao

    Abstract: Accurate multi-modal medical image translation requires ha-rmonizing global anatomical semantics and local structural fidelity, a challenge complicated by intermodality information loss and structural distortion. We propose ABS-Mamba, a novel architecture integrating the Segment Anything Model 2 (SAM2) for organ-aware semantic representation, specialized convolutional neural networks (CNNs) for pr… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: MICCAI 2025(under view)

  7. arXiv:2505.07344  [pdf, other

    cs.CV cs.AI

    Generative Pre-trained Autoregressive Diffusion Transformer

    Authors: Yuan Zhang, Jiacheng Jiang, Guoqing Ma, Zhiying Lu, Haoyang Huang, Jianlong Yuan, Nan Duan

    Abstract: In this work, we present GPDiT, a Generative Pre-trained Autoregressive Diffusion Transformer that unifies the strengths of diffusion and autoregressive modeling for long-range video synthesis, within a continuous latent space. Instead of predicting discrete tokens, GPDiT autoregressively predicts future latent frames using a diffusion loss, enabling natural modeling of motion dynamics and semanti… ▽ More

    Submitted 15 May, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

  8. arXiv:2505.07203  [pdf, other

    cs.DC

    PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications

    Authors: Kuntai Du, Bowen Wang, Chen Zhang, Yiming Cheng, Qing Lan, Hejian Sang, Yihua Cheng, Jiayi Yao, Xiaoxuan Liu, Yifan Qiao, Ion Stoica, Junchen Jiang

    Abstract: Besides typical generative applications, like ChatGPT, GitHub Copilot, and Cursor, we observe an emerging trend that LLMs are increasingly used in traditional discriminative tasks, such as recommendation, credit verification, and data labeling. The key characteristic of these emerging use cases is that the LLM generates only a single output token, rather than an arbitrarily long sequence of tokens… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  9. arXiv:2505.07062  [pdf, ps, other

    cs.CV cs.AI

    Seed1.5-VL Technical Report

    Authors: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng , et al. (172 additional authors not shown)

    Abstract: We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluati… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  10. arXiv:2505.06991  [pdf, ps, other

    cs.CV

    Technical Report for ICRA 2025 GOOSE 2D Semantic Segmentation Challenge: Leveraging Color Shift Correction, RoPE-Swin Backbone, and Quantile-based Label Denoising Strategy for Robust Outdoor Scene Understanding

    Authors: Chih-Chung Hsu, I-Hsuan Wu, Wen-Hai Tseng, Ching-Heng Cheng, Ming-Hsuan Wu, Jin-Hui Jiang, Yu-Jou Hsiao

    Abstract: This report presents our semantic segmentation framework developed by team ACVLAB for the ICRA 2025 GOOSE 2D Semantic Segmentation Challenge, which focuses on parsing outdoor scenes into nine semantic categories under real-world conditions. Our method integrates a Swin Transformer backbone enhanced with Rotary Position Embedding (RoPE) for improved spatial generalization, alongside a Color Shift E… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  11. arXiv:2505.06897  [pdf, other

    cs.AI

    Embodied Intelligence: The Key to Unblocking Generalized Artificial Intelligence

    Authors: Jinhao Jiang, Changlin Chen, Shile Feng, Wanru Geng, Zesheng Zhou, Ni Wang, Shuai Li, Feng-Qi Cui, Erbao Dong

    Abstract: The ultimate goal of artificial intelligence (AI) is to achieve Artificial General Intelligence (AGI). Embodied Artificial Intelligence (EAI), which involves intelligent systems with physical presence and real-time interaction with the environment, has emerged as a key research direction in pursuit of AGI. While advancements in deep learning, reinforcement learning, large-scale language models, an… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: 19pages,7 figures,3 tables

  12. arXiv:2505.06133  [pdf, ps, other

    cs.CV

    BrainSegDMlF: A Dynamic Fusion-enhanced SAM for Brain Lesion Segmentation

    Authors: Hongming Wang, Yifeng Wu, Huimin Huang, Hongtao Wu, Jia-Xuan Jiang, Xiaodong Zhang, Hao Zheng, Xian Wu, Yefeng Zheng, Jinping Xu, Jing Cheng

    Abstract: The segmentation of substantial brain lesions is a significant and challenging task in the field of medical image segmentation. Substantial brain lesions in brain imaging exhibit high heterogeneity, with indistinct boundaries between lesion regions and normal brain tissue. Small lesions in single slices are difficult to identify, making the accurate and reproducible segmentation of abnormal region… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  13. arXiv:2505.05089  [pdf, other

    cs.CV

    Nonlinear Motion-Guided and Spatio-Temporal Aware Network for Unsupervised Event-Based Optical Flow

    Authors: Zuntao Liu, Hao Zhuang, Junjie Jiang, Yuhang Song, Zheng Fang

    Abstract: Event cameras have the potential to capture continuous motion information over time and space, making them well-suited for optical flow estimation. However, most existing learning-based methods for event-based optical flow adopt frame-based techniques, ignoring the spatio-temporal characteristics of events. Additionally, these methods assume linear motion between consecutive events within the loss… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: Accepted to ICRA 2025. Project Page: https://wynelio.github.io/E-NMSTFlow

  14. arXiv:2505.04992  [pdf, other

    stat.ML cs.LG stat.AP

    Boosting Statistic Learning with Synthetic Data from Pretrained Large Models

    Authors: Jialong Jiang, Wenkang Hu, Jian Huang, Yuling Jiao, Xu Liu

    Abstract: The rapid advancement of generative models, such as Stable Diffusion, raises a key question: how can synthetic data from these models enhance predictive modeling? While they can generate vast amounts of datasets, only a subset meaningfully improves performance. We propose a novel end-to-end framework that generates and systematically filters synthetic data through domain-specific statistical metho… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  15. arXiv:2505.02922  [pdf, ps, other

    cs.LG

    RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference

    Authors: Yaoqi Chen, Jinkai Zhang, Baotong Lu, Qianxi Zhang, Chengruidong Zhang, Jingjia Luo, Di Liu, Huiqiang Jiang, Qi Chen, Jing Liu, Bailu Ding, Xiao Yan, Jiawei Jiang, Chen Chen, Mingxing Zhang, Yuqing Yang, Fan Yang, Mao Yang

    Abstract: The growing context lengths of large language models (LLMs) pose significant challenges for efficient inference, primarily due to GPU memory and bandwidth constraints. We present RetroInfer, a novel system that reconceptualizes the key-value (KV) cache as a vector storage system which exploits the inherent attention sparsity to accelerate long-context LLM inference. At its core is the wave index,… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: 16 pages

  16. arXiv:2505.02796  [pdf, other

    cs.GT cs.LG

    Adaptive Bidding Policies for First-Price Auctions with Budget Constraints under Non-stationarity

    Authors: Yige Wang, Jiashuo Jiang

    Abstract: We study how a budget-constrained bidder should learn to adaptively bid in repeated first-price auctions to maximize her cumulative payoff. This problem arose due to an industry-wide shift from second-price auctions to first-price auctions in display advertising recently, which renders truthful bidding (i.e., always bidding one's private value) no longer optimal. We propose a simple dual-gradient-… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  17. arXiv:2505.02784  [pdf, other

    cs.CV

    Advances in Automated Fetal Brain MRI Segmentation and Biometry: Insights from the FeTA 2024 Challenge

    Authors: Vladyslav Zalevskyi, Thomas Sanchez, Misha Kaandorp, Margaux Roulet, Diego Fajardo-Rojas, Liu Li, Jana Hutter, Hongwei Bran Li, Matthew Barkovich, Hui Ji, Luca Wilhelmi, Aline Dändliker, Céline Steger, Mériam Koob, Yvan Gomez, Anton Jakovčić, Melita Klaić, Ana Adžić, Pavel Marković, Gracia Grabarić, Milan Rados, Jordina Aviles Verdera, Gregor Kasprian, Gregor Dovjak, Raphael Gaubert-Rachmühl , et al. (45 additional authors not shown)

    Abstract: Accurate fetal brain tissue segmentation and biometric analysis are essential for studying brain development in utero. The FeTA Challenge 2024 advanced automated fetal brain MRI analysis by introducing biometry prediction as a new task alongside tissue segmentation. For the first time, our diverse multi-centric test set included data from a new low-field (0.55T) MRI dataset. Evaluation metrics wer… ▽ More

    Submitted 8 May, 2025; v1 submitted 5 May, 2025; originally announced May 2025.

  18. arXiv:2505.02179  [pdf, other

    cs.CV

    ProDisc-VAD: An Efficient System for Weakly-Supervised Anomaly Detection in Video Surveillance Applications

    Authors: Tao Zhu, Qi Yu, Xinru Dong, Shiyu Li, Yue Liu, Jinlong Jiang, Lei Shu

    Abstract: Weakly-supervised video anomaly detection (WS-VAD) using Multiple Instance Learning (MIL) suffers from label ambiguity, hindering discriminative feature learning. We propose ProDisc-VAD, an efficient framework tackling this via two synergistic components. The Prototype Interaction Layer (PIL) provides controlled normality modeling using a small set of learnable prototypes, establishing a robust ba… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

  19. arXiv:2505.01729  [pdf, ps, other

    cs.CV

    PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth

    Authors: Bu Jin, Weize Li, Baihan Yang, Zhenxin Zhu, Junpeng Jiang, Huan-ang Gao, Haiyang Sun, Kun Zhan, Hengtong Hu, Xueyang Zhang, Peng Jia, Hao Zhao

    Abstract: Recent advancements in autonomous driving (AD) systems have highlighted the potential of world models in achieving robust and generalizable performance across both ordinary and challenging driving conditions. However, a key challenge remains: precise and flexible camera pose control, which is crucial for accurate viewpoint transformation and realistic simulation of scene dynamics. In this paper, w… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

    Comments: 8 pages, 3 figures

  20. arXiv:2505.01286  [pdf, other

    cs.LG cs.AI

    2DXformer: Dual Transformers for Wind Power Forecasting with Dual Exogenous Variables

    Authors: Yajuan Zhang, Jiahai Jiang, Yule Yan, Liang Yang, Ping Zhang

    Abstract: Accurate wind power forecasting can help formulate scientific dispatch plans, which is of great significance for maintaining the safety, stability, and efficient operation of the power system. In recent years, wind power forecasting methods based on deep learning have focused on extracting the spatiotemporal correlations among data, achieving significant improvements in forecasting accuracy. Howev… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: Accepted by ICDM 2024

  21. arXiv:2505.01224  [pdf, other

    cs.CV eess.IV

    RD-UIE: Relation-Driven State Space Modeling for Underwater Image Enhancement

    Authors: Kui Jiang, Yan Luo, Junjun Jiang, Xin Xu, Fei Ma, Fei Yu

    Abstract: Underwater image enhancement (UIE) is a critical preprocessing step for marine vision applications, where wavelength-dependent attenuation causes severe content degradation and color distortion. While recent state space models like Mamba show potential for long-range dependency modeling, their unfolding operations and fixed scan paths on 1D sequences fail to adapt to local object semantics and glo… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  22. arXiv:2505.01075  [pdf, other

    cs.LG

    Federated Adapter on Foundation Models: An Out-Of-Distribution Approach

    Authors: Yiyuan Yang, Guodong Long, Tianyi Zhou, Qinghua Lu, Shanshan Ye, Jing Jiang

    Abstract: As foundation models gain prominence, Federated Foundation Models (FedFM) have emerged as a privacy-preserving approach to collaboratively fine-tune models in federated learning (FL) frameworks using distributed datasets across clients. A key challenge for FedFM, given the versatile nature of foundation models, is addressing out-of-distribution (OOD) generalization, where unseen tasks or clients m… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  23. arXiv:2504.20964  [pdf, other

    cs.CL cs.AI cs.OS cs.PL cs.SE

    OSVBench: Benchmarking LLMs on Specification Generation Tasks for Operating System Verification

    Authors: Shangyu Li, Juyong Jiang, Tiancheng Zhao, Jiasi Shen

    Abstract: We introduce OSVBench, a new benchmark for evaluating Large Language Models (LLMs) in generating complete specification code pertaining to operating system kernel verification tasks. The benchmark first defines the specification generation problem into a program synthesis problem within a confined scope of syntax and semantics by providing LLMs with the programming model. The LLMs are required to… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  24. arXiv:2504.20461  [pdf, other

    cs.DC

    Efficient Graph-Based Approximate Nearest Neighbor Search Achieving: Low Latency Without Throughput Loss

    Authors: Jingjia Luo, Mingxing Zhang, Kang Chen, Xia Liao, Yingdi Shan, Jinlei Jiang, Yongwei Wu

    Abstract: The increase in the dimensionality of neural embedding models has enhanced the accuracy of semantic search capabilities but also amplified the computational demands for Approximate Nearest Neighbor Searches (ANNS). This complexity poses significant challenges in online and interactive services, where query latency is a critical performance metric. Traditional graph-based ANNS methods, while effect… ▽ More

    Submitted 30 April, 2025; v1 submitted 29 April, 2025; originally announced April 2025.

  25. arXiv:2504.20426  [pdf, ps, other

    cs.AI

    RV-Syn: Rational and Verifiable Mathematical Reasoning Data Synthesis based on Structured Function Library

    Authors: Jiapeng Wang, Jinhao Jiang, Zhiqiang Zhang, Jun Zhou, Wayne Xin Zhao

    Abstract: The advancement of reasoning capabilities in Large Language Models (LLMs) requires substantial amounts of high-quality reasoning data, particularly in mathematics. Existing data synthesis methods, such as data augmentation from annotated training sets or direct question generation based on relevant knowledge points and documents, have expanded datasets but face challenges in mastering the inner lo… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  26. arXiv:2504.20410  [pdf, other

    cs.IT eess.SP

    Terahertz Wireless Data Center: Gaussian Beam or Airy Beam?

    Authors: Wenqi Zhao, Sergi Abadal, Guochao Song, Jiamo Jiang, Chong Han

    Abstract: Terahertz (THz) communication is emerging as a pivotal enabler for 6G and beyond wireless systems owing to its multi-GHz bandwidth. One of its novel applications is in wireless data centers, where it enables ultra-high data rates while enhancing network reconfigurability and scalability. However, due to numerous racks, supporting walls, and densely deployed antennas, the line-of-sight (LoS) path i… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  27. arXiv:2504.19614  [pdf, other

    cs.CV

    DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer

    Authors: Junpeng Jiang, Gangyi Hong, Miao Zhang, Hengtong Hu, Kun Zhan, Rui Shao, Liqiang Nie

    Abstract: Collecting multi-view driving scenario videos to enhance the performance of 3D visual perception tasks presents significant challenges and incurs substantial costs, making generative models for realistic data an appealing alternative. Yet, the videos generated by recent works suffer from poor quality and spatiotemporal consistency, undermining their utility in advancing perception tasks under driv… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  28. arXiv:2504.18838  [pdf, other

    cs.CL

    Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks

    Authors: Yixin Cao, Shibo Hong, Xinze Li, Jiahao Ying, Yubo Ma, Haiyuan Liang, Yantao Liu, Zijun Yao, Xiaozhi Wang, Dan Huang, Wenxuan Zhang, Lifu Huang, Muhao Chen, Lei Hou, Qianru Sun, Xingjun Ma, Zuxuan Wu, Min-Yen Kan, David Lo, Qi Zhang, Heng Ji, Jing Jiang, Juanzi Li, Aixin Sun, Xuanjing Huang , et al. (2 additional authors not shown)

    Abstract: Large Language Models (LLMs) are advancing at an amazing speed and have become indispensable across academia, industry, and daily applications. To keep pace with the status quo, this survey probes the core challenges that the rise of LLMs poses for evaluation. We identify and analyze two pivotal transitions: (i) from task-specific to capability-based evaluation, which reorganizes benchmarks around… ▽ More

    Submitted 26 April, 2025; originally announced April 2025.

  29. arXiv:2504.18459  [pdf, other

    cs.IT

    Probabilistic Shaping in MIMO: Going Beyond 1.53dB AWGN Gain With the Non-Linear Demapper

    Authors: Kirill Ivanov, Wei Yang, Jing Jiang

    Abstract: Constellation shaping is a well-established method to improve upon a regular quadrature amplitude modulation (QAM). It is known that the gain achieved by any shaping method for an additive white Gaussian noise (AWGN) channel is upper-bounded by 1.53dB. However, the situation becomes less clear in the multiple-input and multiple-output (MIMO) setting. In this paper, we study the application of pr… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: Submitted to ISTC'25

  30. arXiv:2504.17801  [pdf, other

    cs.NE cs.AI

    Evolution of Optimization Algorithms for Global Placement via Large Language Models

    Authors: Xufeng Yao, Jiaxi Jiang, Yuxuan Zhao, Peiyu Liao, Yibo Lin, Bei Yu

    Abstract: Optimization algorithms are widely employed to tackle complex problems, but designing them manually is often labor-intensive and requires significant expertise. Global placement is a fundamental step in electronic design automation (EDA). While analytical approaches represent the state-of-the-art (SOTA) in global placement, their core optimization algorithms remain heavily dependent on heuristics… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  31. arXiv:2504.17787  [pdf, other

    cs.CV

    The Fourth Monocular Depth Estimation Challenge

    Authors: Anton Obukhov, Matteo Poggi, Fabio Tosi, Ripudaman Singh Arora, Jaime Spencer, Chris Russell, Simon Hadfield, Richard Bowden, Shuaihang Wang, Zhenxin Ma, Weijie Chen, Baobei Xu, Fengyu Sun, Di Xie, Jiang Zhu, Mykola Lavreniuk, Haining Guan, Qun Wu, Yupei Zeng, Chao Lu, Huanran Wang, Guangyuan Zhou, Haotian Zhang, Jianxiong Wang, Qiang Rao , et al. (32 additional authors not shown)

    Abstract: This paper presents the results of the fourth edition of the Monocular Depth Estimation Challenge (MDEC), which focuses on zero-shot generalization to the SYNS-Patches benchmark, a dataset featuring challenging environments in both natural and indoor settings. In this edition, we revised the evaluation protocol to use least-squares alignment with two degrees of freedom to support disparity and aff… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: To appear in CVPRW2025

  32. arXiv:2504.16109  [pdf, other

    cs.LG

    Representation Learning for Tabular Data: A Comprehensive Survey

    Authors: Jun-Peng Jiang, Si-Yang Liu, Hao-Run Cai, Qile Zhou, Han-Jia Ye

    Abstract: Tabular data, structured as rows and columns, is among the most prevalent data types in machine learning classification and regression applications. Models for learning from tabular data have continuously evolved, with Deep Neural Networks (DNNs) recently demonstrating promising results through their capability of representation learning. In this survey, we systematically introduce the field of ta… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  33. arXiv:2504.15785  [pdf, other

    cs.AI

    WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents

    Authors: Siyu Zhou, Tianyi Zhou, Yijun Yang, Guodong Long, Deheng Ye, Jing Jiang, Chengqi Zhang

    Abstract: Can we build accurate world models out of large language models (LLMs)? How can world models benefit LLM agents? The gap between the prior knowledge of LLMs and the specified environment's dynamics usually bottlenecks LLMs' performance as world models. To bridge the gap, we propose a training-free "world alignment" that learns an environment's symbolic knowledge complementary to LLMs. The symbolic… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: Code is available at https://github.com/elated-sawyer/WALL-E

  34. arXiv:2504.15271  [pdf, other

    cs.CV

    Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models

    Authors: Guo Chen, Zhiqi Li, Shihao Wang, Jindong Jiang, Yicheng Liu, Lidong Lu, De-An Huang, Wonmin Byeon, Matthieu Le, Tuomas Rintamaki, Tyler Poon, Max Ehrlich, Tuomas Rintamaki, Tyler Poon, Tong Lu, Limin Wang, Bryan Catanzaro, Jan Kautz, Andrew Tao, Zhiding Yu, Guilin Liu

    Abstract: We introduce Eagle 2.5, a family of frontier vision-language models (VLMs) for long-context multimodal learning. Our work addresses the challenges in long video comprehension and high-resolution image understanding, introducing a generalist framework for both tasks. The proposed training framework incorporates Automatic Degrade Sampling and Image Area Preservation, two techniques that preserve con… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  35. arXiv:2504.14938  [pdf

    stat.AP cs.LG

    Integrating Response Time and Attention Duration in Bayesian Preference Learning for Multiple Criteria Decision Aiding

    Authors: Jiaxuan Jiang, Jiapeng Liu, Miłosz Kadziński, Xiuwu Liao, Jingyu Dong

    Abstract: We introduce a multiple criteria Bayesian preference learning framework incorporating behavioral cues for decision aiding. The framework integrates pairwise comparisons, response time, and attention duration to deepen insights into decision-making processes. The approach employs an additive value function model and utilizes a Bayesian framework to derive the posterior distribution of potential ran… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  36. arXiv:2504.14600  [pdf, ps, other

    cs.CV

    NTIRE 2025 Challenge on Real-World Face Restoration: Methods and Results

    Authors: Zheng Chen, Jingkai Wang, Kai Liu, Jue Gong, Lei Sun, Zongwei Wu, Radu Timofte, Yulun Zhang, Jianxing Zhang, Jinlong Wu, Jun Wang, Zheng Xie, Hakjae Jeon, Suejin Han, Hyung-Ju Chun, Hyunhee Park, Zhicun Yin, Junjie Chen, Ming Liu, Xiaoming Li, Chao Zhou, Wangmeng Zuo, Weixia Zhang, Dingquan Li, Kede Ma , et al. (29 additional authors not shown)

    Abstract: This paper provides a review of the NTIRE 2025 challenge on real-world face restoration, highlighting the proposed solutions and the resulting outcomes. The challenge focuses on generating natural, realistic outputs while maintaining identity consistency. Its goal is to advance state-of-the-art solutions for perceptual quality and realism, without imposing constraints on computational resources or… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

    Comments: NTIRE 2025 webpage: https://www.cvlai.net/ntire/2025. Code: https://github.com/zhengchen1999/NTIRE2025_RealWorld_Face_Restoration

  37. arXiv:2504.14573  [pdf, other

    cs.RO cs.AI

    Modality Selection and Skill Segmentation via Cross-Modality Attention

    Authors: Jiawei Jiang, Kei Ota, Devesh K. Jha, Asako Kanezaki

    Abstract: Incorporating additional sensory modalities such as tactile and audio into foundational robotic models poses significant challenges due to the curse of dimensionality. This work addresses this issue through modality selection. We propose a cross-modality attention (CMA) mechanism to identify and selectively utilize the modalities that are most informative for action generation at each timestep. Fu… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  38. arXiv:2504.13500  [pdf, ps, other

    cs.CL

    Prejudge-Before-Think: Enhancing Large Language Models at Test-Time by Process Prejudge Reasoning

    Authors: Jianing Wang, Jin Jiang, Yang Liu, Mengdi Zhang, Xunliang Cai

    Abstract: In this paper, we introduce a new \emph{process prejudge} strategy in LLM reasoning to demonstrate that bootstrapping with process prejudge allows the LLM to adaptively anticipate the errors encountered when advancing the subsequent reasoning steps, similar to people sometimes pausing to think about what mistakes may occur and how to avoid them, rather than relying solely on trial and error. Speci… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  39. arXiv:2504.11239  [pdf, other

    cs.AI cs.CL

    Nondeterministic Polynomial-time Problem Challenge: An Ever-Scaling Reasoning Benchmark for LLMs

    Authors: Chang Yang, Ruiyu Wang, Junzhe Jiang, Qi Jiang, Qinggang Zhang, Yanchen Deng, Shuxin Li, Shuyue Hu, Bo Li, Florian T. Pokorny, Xiao Huang, Xinrun Wang

    Abstract: Reasoning is the fundamental capability of large language models (LLMs). Due to the rapid progress of LLMs, there are two main issues of current benchmarks: i) these benchmarks can be crushed in a short time (less than 1 year), and ii) these benchmarks may be easily hacked. To handle these issues, we propose the ever-scalingness for building the benchmarks which are uncrushable, unhackable, auto-v… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: Preliminary work, 10 pages for main text

  40. arXiv:2504.09973  [pdf, other

    cs.CV

    Beyond Degradation Redundancy: Contrastive Prompt Learning for All-in-One Image Restoration

    Authors: Gang Wu, Junjun Jiang, Kui Jiang, Xianming Liu, Liqiang Nie

    Abstract: All-in-one image restoration, addressing diverse degradation types with a unified model, presents significant challenges in designing task-specific prompts that effectively guide restoration across multiple degradation scenarios. While adaptive prompt learning enables end-to-end optimization, it often yields overlapping or redundant task representations. Conversely, explicit prompts derived from p… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Project page: https://github.com/Aitical/CPLIR

  41. arXiv:2504.09844  [pdf, other

    cs.DC cs.AI

    OVERLORD: Ultimate Scaling of DataLoader for Multi-Source Large Foundation Model Training

    Authors: Juntao Zhao, Qi Lu, Wei Jia, Borui Wan, Lei Zuo, Junda Feng, Jianyu Jiang, Yangrui Chen, Shuaishuai Cao, Jialing He, Kaihua Jiang, Yuanzhe Hu, Yanghua Peng, Haibin Lin, Xin Liu, Chuan Wu

    Abstract: Modern frameworks for training large foundation models (LFMs) employ data loaders in a data parallel paradigm. While this design offers implementation simplicity, it introduces two fundamental challenges. First, due to the quadratic computational complexity of the attention operator, the non-uniform sample distribution over data-parallel ranks leads to a significant workload imbalance among loader… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

  42. arXiv:2504.09513  [pdf, other

    cs.CV

    DiffuMural: Restoring Dunhuang Murals with Multi-scale Diffusion

    Authors: Puyu Han, Jiaju Kang, Yuhang Pan, Erting Pan, Zeyu Zhang, Qunchao Jin, Juntao Jiang, Zhichen Liu, Luqi Gong

    Abstract: Large-scale pre-trained diffusion models have produced excellent results in the field of conditional image generation. However, restoration of ancient murals, as an important downstream task in this field, poses significant challenges to diffusion model-based restoration methods due to its large defective area and scarce training samples. Conditional restoration tasks are more concerned with wheth… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

  43. arXiv:2504.09311  [pdf, other

    cs.DB

    Dupin: A Parallel Framework for Densest Subgraph Discovery in Fraud Detection on Massive Graphs (Technical Report)

    Authors: Jiaxin Jiang, Siyuan Yao, Yuchen Li, Qiange Wang, Bingsheng He, Min Chen

    Abstract: Detecting fraudulent activities in financial and e-commerce transaction networks is crucial. One effective method for this is Densest Subgraph Discovery (DSD). However, deploying DSD methods in production systems faces substantial scalability challenges due to the predominantly sequential nature of existing methods, which impedes their ability to handle large-scale transaction networks and results… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  44. arXiv:2504.06768  [pdf, other

    cs.LG

    FedMerge: Federated Personalization via Model Merging

    Authors: Shutong Chen, Tianyi Zhou, Guodong Long, Jing Jiang, Chengqi Zhang

    Abstract: One global model in federated learning (FL) might not be sufficient to serve many clients with non-IID tasks and distributions. While there has been advances in FL to train multiple global models for better personalization, they only provide limited choices to clients so local finetuning is still indispensable. In this paper, we propose a novel ``FedMerge'' approach that can create a personalized… ▽ More

    Submitted 24 April, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

  45. arXiv:2504.05164  [pdf, other

    cs.CV

    Balancing Task-invariant Interaction and Task-specific Adaptation for Unified Image Fusion

    Authors: Xingyu Hu, Junjun Jiang, Chenyang Wang, Kui Jiang, Xianming Liu, Jiayi Ma

    Abstract: Unified image fusion aims to integrate complementary information from multi-source images, enhancing image quality through a unified framework applicable to diverse fusion tasks. While treating all fusion tasks as a unified problem facilitates task-invariant knowledge sharing, it often overlooks task-specific characteristics, thereby limiting the overall performance. Existing general image fusion… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  46. arXiv:2504.04869  [pdf, other

    cs.CV

    Content-Aware Transformer for All-in-one Image Restoration

    Authors: Gang Wu, Junjun Jiang, Kui Jiang, Xianming Liu

    Abstract: Image restoration has witnessed significant advancements with the development of deep learning models. Although Transformer architectures have progressed considerably in recent years, challenges remain, particularly the limited receptive field in window-based self-attention. In this work, we propose DSwinIR, a Deformable Sliding window Transformer for Image Restoration. DSwinIR introduces a novel… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  47. arXiv:2504.04519  [pdf, other

    cs.CV

    SAM2MOT: A Novel Paradigm of Multi-Object Tracking by Segmentation

    Authors: Junjie Jiang, Zelin Wang, Manqi Zhao, Yin Li, DongSheng Jiang

    Abstract: Segment Anything 2 (SAM2) enables robust single-object tracking using segmentation. To extend this to multi-object tracking (MOT), we propose SAM2MOT, introducing a novel Tracking by Segmentation paradigm. Unlike Tracking by Detection or Tracking by Query, SAM2MOT directly generates tracking boxes from segmentation masks, reducing reliance on detection accuracy. SAM2MOT has two key advantages: zer… ▽ More

    Submitted 5 May, 2025; v1 submitted 6 April, 2025; originally announced April 2025.

  48. arXiv:2504.04438  [pdf, other

    cs.NI

    DRAMA: A Dynamic Packet Routing Algorithm using Multi-Agent Reinforcement Learning with Emergent Communication

    Authors: Wang Zhang, Chenguang Liu, Yue Pi, Yong Zhang, Hairong Huang, Baoquan Rao, Yulong Ding, Shuanghua Yang, Jie Jiang

    Abstract: The continuous expansion of network data presents a pressing challenge for conventional routing algorithms. As the demand escalates, these algorithms are struggling to cope. In this context, reinforcement learning (RL) and multi-agent reinforcement learning (MARL) algorithms emerge as promising solutions. However, the urgency and importance of the problem are clear, as existing RL/MARL-based routi… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

    Comments: This article has been accepted by IJCNN 2025

  49. arXiv:2504.03164  [pdf, other

    cs.CV cs.AI

    NuScenes-SpatialQA: A Spatial Understanding and Reasoning Benchmark for Vision-Language Models in Autonomous Driving

    Authors: Kexin Tian, Jingrui Mao, Yunlong Zhang, Jiwan Jiang, Yang Zhou, Zhengzhong Tu

    Abstract: Recent advancements in Vision-Language Models (VLMs) have demonstrated strong potential for autonomous driving tasks. However, their spatial understanding and reasoning-key capabilities for autonomous driving-still exhibit significant limitations. Notably, none of the existing benchmarks systematically evaluate VLMs' spatial reasoning capabilities in driving scenarios. To fill this gap, we propose… ▽ More

    Submitted 6 April, 2025; v1 submitted 4 April, 2025; originally announced April 2025.

  50. arXiv:2504.02921  [pdf, other

    cs.CL

    HyperRAG: Enhancing Quality-Efficiency Tradeoffs in Retrieval-Augmented Generation with Reranker KV-Cache Reuse

    Authors: Yuwei An, Yihua Cheng, Seo Jin Park, Junchen Jiang

    Abstract: Retrieval-Augmented Generation (RAG) has emerged as a powerful paradigm for enhancing the performance of large language models (LLMs) by integrating external knowledge into the generation process. A key component of RAG pipelines is the reranker, which selects the most relevant documents from a pool of retrieved candidates and significantly improves the quality of the generated responses. While re… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.