Skip to main content

Showing 1–50 of 652 results for author: Pan, L

.
  1. arXiv:2507.01335  [pdf, ps, other

    cs.CL cs.AI

    LEDOM: An Open and Fundamental Reverse Language Model

    Authors: Xunjian Yin, Sitao Cheng, Yuxi Xie, Xinyu Hu, Li Lin, Xinyi Wang, Liangming Pan, William Yang Wang, Xiaojun Wan

    Abstract: We introduce LEDOM, the first purely reverse language model, trained autoregressively on 435B tokens with 2B and 7B parameter variants, which processes sequences in reverse temporal order through previous token prediction. For the first time, we present the reverse language model as a potential foundational model across general tasks, accompanied by a set of intriguing examples and insights. Based… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: Work in progress

  2. arXiv:2507.01006  [pdf, ps, other

    cs.CV cs.AI cs.LG

    GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

    Authors: GLM-V Team, :, Wenyi Hong, Wenmeng Yu, Xiaotao Gu, Guo Wang, Guobing Gan, Haomiao Tang, Jiale Cheng, Ji Qi, Junhui Ji, Lihang Pan, Shuaiqi Duan, Weihan Wang, Yan Wang, Yean Cheng, Zehai He, Zhe Su, Zhen Yang, Ziyang Pan, Aohan Zeng, Baoxu Wang, Boyan Shi, Changyu Pang, Chenhui Zhang , et al. (54 additional authors not shown)

    Abstract: We present GLM-4.1V-Thinking, a vision-language model (VLM) designed to advance general-purpose multimodal understanding and reasoning. In this report, we share our key findings in the development of the reasoning-centric training framework. We first develop a capable vision foundation model with significant potential through large-scale pre-training, which arguably sets the upper bound for the fi… ▽ More

    Submitted 2 July, 2025; v1 submitted 1 July, 2025; originally announced July 2025.

  3. arXiv:2506.13672  [pdf, ps, other

    cs.LG

    The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep Reinforcement Learning

    Authors: Jiashun Liu, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Ling Pan

    Abstract: Off-policy deep reinforcement learning (RL) typically leverages replay buffers for reusing past experiences during learning. This can help improve sample efficiency when the collected data is informative and aligned with the learning objectives; when that is not the case, it can have the effect of "polluting" the replay buffer with data which can exacerbate optimization challenges in addition to w… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Proceedings of the 42nd International Conference on Machine Learning (ICML 2025)

  4. arXiv:2506.13326  [pdf, ps, other

    cs.CV cs.HC

    VIS-Shepherd: Constructing Critic for LLM-based Data Visualization Generation

    Authors: Bo Pan, Yixiao Fu, Ke Wang, Junyu Lu, Lunke Pan, Ziyang Qian, Yuhan Chen, Guoliang Wang, Yitao Zhou, Li Zheng, Yinghao Tang, Zhen Wen, Yuchen Wu, Junhua Lu, Biao Zhu, Minfeng Zhu, Bo Zhang, Wei Chen

    Abstract: Data visualization generation using Large Language Models (LLMs) has shown promising results but often produces suboptimal visualizations that require human intervention for improvement. In this work, we introduce VIS-Shepherd, a specialized Multimodal Large Language Model (MLLM)-based critic to evaluate and provide feedback for LLM-generated data visualizations. At the core of our approach is a f… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  5. arXiv:2506.09385  [pdf, ps, other

    cs.CV

    ReID5o: Achieving Omni Multi-modal Person Re-identification in a Single Model

    Authors: Jialong Zuo, Yongtai Deng, Mengdan Tan, Rui Jin, Dongyue Wu, Nong Sang, Liang Pan, Changxin Gao

    Abstract: In real-word scenarios, person re-identification (ReID) expects to identify a person-of-interest via the descriptive query, regardless of whether the query is a single modality or a combination of multiple modalities. However, existing methods and datasets remain constrained to limited modalities, failing to meet this requirement. Therefore, we investigate a new challenging problem called Omni Mul… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  6. arXiv:2506.06005  [pdf, other

    cs.LG

    LightGTS: A Lightweight General Time Series Forecasting Model

    Authors: Yihang Wang, Yuying Qiu, Peng Chen, Yang Shu, Zhongwen Rao, Lujia Pan, Bin Yang, Chenjuan Guo

    Abstract: Existing works on general time series forecasting build foundation models with heavy model parameters through large-scale multi-source pre-training. These models achieve superior generalization ability across various datasets at the cost of significant computational burdens and limitations in resource-constrained scenarios. This paper introduces LightGTS, a lightweight general time series forecast… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: Accepted by the 42th International Conference on Machine Learning (ICML 2025)

  7. arXiv:2506.03185  [pdf, ps, other

    eess.IV cs.AI cs.CV q-bio.QM

    DLiPath: A Benchmark for the Comprehensive Assessment of Donor Liver Based on Histopathological Image Dataset

    Authors: Liangrui Pan, Xingchen Li, Zhongyi Chen, Ling Chu, Shaoliang Peng

    Abstract: Pathologists comprehensive evaluation of donor liver biopsies provides crucial information for accepting or discarding potential grafts. However, rapidly and accurately obtaining these assessments intraoperatively poses a significant challenge for pathologists. Features in donor liver biopsies, such as portal tract fibrosis, total steatosis, macrovesicular steatosis, and hepatocellular ballooning… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

    Comments: Submit to ACM MM2025

  8. arXiv:2506.00096  [pdf, ps, other

    q-bio.GN cs.AI

    PathGene: Benchmarking Driver Gene Mutations and Exon Prediction Using Multicenter Lung Cancer Histopathology Image Dataset

    Authors: Liangrui Pan, Qingchun Liang, Shen Zhao, Songqing Fan, Shaoliang Peng

    Abstract: Accurately predicting gene mutations, mutation subtypes and their exons in lung cancer is critical for personalized treatment planning and prognostic assessment. Faced with regional disparities in medical resources and the high cost of genomic assays, using artificial intelligence to infer these mutations and exon variants from routine histopathology images could greatly facilitate precision thera… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

    Comments: Submit to NIPS2025

  9. arXiv:2505.24061  [pdf, ps, other

    cs.LG

    Measure gradients, not activations! Enhancing neuronal activity in deep reinforcement learning

    Authors: Jiashun Liu, Zihao Wu, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Ling Pan

    Abstract: Deep reinforcement learning (RL) agents frequently suffer from neuronal activity loss, which impairs their ability to adapt to new data and learn continually. A common method to quantify and address this issue is the tau-dormant neuron ratio, which uses activation statistics to measure the expressive ability of neurons. While effective for simple MLP-based agents, this approach loses statistical p… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  10. arXiv:2505.23898  [pdf, ps, other

    astro-ph.GA

    The Density Distribution of Compressively-Forced Supersonic Turbulence Depends on the Driving Correlation Time

    Authors: Philipp Grete, Evan Scannapieco, Marcus Brüggen, Liubin Pan

    Abstract: Supersonic turbulence plays a critical role in shaping astrophysical systems, from molecular clouds to the circumgalactic medium. Key properties of this turbulence include the Mach number, driving scale, and nature of the driving mechanism, which can be solenoidal (divergence-free), compressive (curl-free), or a mix of the two. A less studied property is the correlation time of the driving acceler… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: 16 pages, 9 figures, ApJ in press, comments welcome

  11. arXiv:2505.23653  [pdf, ps, other

    cs.LG

    How does Transformer Learn Implicit Reasoning?

    Authors: Jiaran Ye, Zijun Yao, Zhidian Huang, Liangming Pan, Jinxin Liu, Yushi Bai, Amy Xin, Liu Weichuan, Xiaoyin Che, Lei Hou, Juanzi Li

    Abstract: Recent work suggests that large language models (LLMs) can perform multi-hop reasoning implicitly -- producing correct answers without explicitly verbalizing intermediate steps -- but the underlying mechanisms remain poorly understood. In this paper, we study how such implicit reasoning emerges by training transformers from scratch in a controlled symbolic environment. Our analysis reveals a three… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  12. arXiv:2505.20691  [pdf, ps, other

    cs.LG cs.AI

    Evidential Deep Active Learning for Semi-Supervised Classification

    Authors: Shenkai Zhao, Xinao Zhang, Lipeng Pan, Xiaobin Xu, Danilo Pelusi

    Abstract: Semi-supervised classification based on active learning has made significant progress, but the existing methods often ignore the uncertainty estimation (or reliability) of the prediction results during the learning process, which makes it questionable whether the selected samples can effectively update the model. Hence, this paper proposes an evidential deep active learning approach for semi-super… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: 9 pages, 4 figures

    ACM Class: I.2.6

  13. arXiv:2505.18761  [pdf, other

    cs.CL cs.AI cs.LG

    How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark

    Authors: Minglai Yang, Ethan Huang, Liang Zhang, Mihai Surdeanu, William Wang, Liangming Pan

    Abstract: We introduce Grade School Math with Distracting Context (GSM-DC), a synthetic benchmark to evaluate Large Language Models' (LLMs) reasoning robustness against systematically controlled irrelevant context (IC). GSM-DC constructs symbolic reasoning graphs with precise distractor injections, enabling rigorous, reproducible evaluation. Our experiments demonstrate that LLMs are significantly sensitive… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    Comments: 15 pages, 9 figure, 4 tables

  14. arXiv:2505.18325   

    cs.AI cs.LG

    Understanding and Mitigating Overrefusal in LLMs from an Unveiling Perspective of Safety Decision Boundary

    Authors: Licheng Pan, Yongqi Tong, Xin Zhang, Xiaolu Zhang, Jun Zhou, Zhixuan Chu

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks, yet they often refuse to answer legitimate queries-a phenomenon known as overrefusal. Overrefusal typically stems from over-conservative safety alignment, causing models to treat many reasonable prompts as potentially risky. To systematically understand this issue, we probe and leverage the models'… ▽ More

    Submitted 29 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

    Comments: We have identified significant errors in the results presented in this paper, specifically in the evaluation sections concerning the DPO training of LLaMA2 and Qwen2.5, as well as in the representation space visualization section. Given the extent of these issues, we intend to substantially revise the manuscript's content and structure. Hence, we request to withdraw it from arXiv at this time

  15. arXiv:2505.17872  [pdf, ps, other

    cs.LG cs.AI

    Mixture of Low Rank Adaptation with Partial Parameter Sharing for Time Series Forecasting

    Authors: Licheng Pan, Zhichao Chen, Haoxuan Li, Guangyi Liu, Zhijian Xu, Zhaoran Liu, Hao Wang, Ying Wei

    Abstract: Multi-task forecasting has become the standard approach for time-series forecasting (TSF). However, we show that it suffers from an Expressiveness Bottleneck, where predictions at different time steps share the same representation, leading to unavoidable errors even with optimal representations. To address this issue, we propose a two-stage framework: first, pre-train a foundation model for one-st… ▽ More

    Submitted 27 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

  16. arXiv:2505.17847  [pdf, ps, other

    cs.LG cs.AI eess.SY

    TransDF: Time-Series Forecasting Needs Transformed Label Alignment

    Authors: Hao Wang, Licheng Pan, Zhichao Chen, Xu Chen, Qingyang Dai, Lei Wang, Haoxuan Li, Zhouchen Lin

    Abstract: Training time-series forecasting models presents unique challenges in designing effective learning objectives. Existing methods predominantly utilize the temporal mean squared error, which faces two critical challenges: (1) label autocorrelation, which leads to bias from the label sequence likelihood; (2) excessive amount of tasks, which increases with the forecast horizon and complicates optimiza… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  17. arXiv:2505.17621  [pdf, ps, other

    cs.LG

    Navigate the Unknown: Enhancing LLM Reasoning with Intrinsic Motivation Guided Exploration

    Authors: Jingtong Gao, Ling Pan, Yejing Wang, Rui Zhong, Chi Lu, Qingpeng Cai, Peng Jiang, Xiangyu Zhao

    Abstract: Reinforcement learning (RL) has emerged as a pivotal method for improving the reasoning capabilities of Large Language Models (LLMs). However, prevalent RL approaches such as Proximal Policy Optimization (PPO) and Group-Regularized Policy Optimization (GRPO) face critical limitations due to their reliance on sparse outcome-based rewards and inadequate mechanisms for incentivizing exploration. Thes… ▽ More

    Submitted 27 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

  18. arXiv:2505.17618  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Scaling Image and Video Generation via Test-Time Evolutionary Search

    Authors: Haoran He, Jiajun Liang, Xintao Wang, Pengfei Wan, Di Zhang, Kun Gai, Ling Pan

    Abstract: As the marginal cost of scaling computation (data and parameters) during model pre-training continues to increase substantially, test-time scaling (TTS) has emerged as a promising direction for improving generative model performance by allocating additional computation at inference time. While TTS has demonstrated significant success across multiple language tasks, there remains a notable gap in u… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: 37 pages. Project: https://tinnerhrhe.github.io/evosearch

  19. arXiv:2505.17250  [pdf, ps, other

    cs.CL cs.AI cs.LG

    ConciseRL: Conciseness-Guided Reinforcement Learning for Efficient Reasoning Models

    Authors: Razvan-Gabriel Dumitru, Darius Peteleaza, Vikas Yadav, Liangming Pan

    Abstract: Large language models excel at complex tasks by breaking down problems into structured reasoning steps. However, reasoning traces often extend beyond reaching a correct answer, causing wasted computation, reduced readability, and hallucinations. To address this, we introduce a novel hyperparameter-free conciseness score used as a reward signal within a reinforcement learning framework to guide mod… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: 25 pages, 18 figures, and 6 tables

    ACM Class: I.2.7; I.2.0

  20. arXiv:2505.12738  [pdf, ps, other

    cs.LG cs.AI cs.SI

    EpiLLM: Unlocking the Potential of Large Language Models in Epidemic Forecasting

    Authors: Chenghua Gong, Rui Sun, Yuhao Zheng, Juyuan Zhang, Tianjun Gu, Liming Pan, Linyuan Lv

    Abstract: Advanced epidemic forecasting is critical for enabling precision containment strategies, highlighting its strategic importance for public health security. While recent advances in Large Language Models (LLMs) have demonstrated effectiveness as foundation models for domain-specific tasks, their potential for epidemic forecasting remains largely unexplored. In this paper, we introduce EpiLLM, a nove… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: 18 pages

  21. arXiv:2505.10083  [pdf, ps, other

    cs.LG

    ChronoSteer: Bridging Large Language Model and Time Series Foundation Model via Synthetic Data

    Authors: Chengsen Wang, Qi Qi, Zhongwen Rao, Lujia Pan, Jingyu Wang, Jianxin Liao

    Abstract: Conventional forecasting methods rely on unimodal time series data, limiting their ability to exploit rich textual information. Recently, large language models (LLMs) and time series foundation models (TSFMs) have demonstrated powerful capability in textual reasoning and temporal modeling, respectively. Integrating the strengths of both to construct a multimodal model that concurrently leverages b… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  22. arXiv:2505.07034  [pdf, ps, other

    cs.IR

    NetSight: Graph Attention Based Traffic Forecasting in Computer Networks

    Authors: Jinming Xing, Guoheng Sun, Hui Sun, Linchao Pan, Shakir Mahmood, Xuanhao Luo, Muhammad Shahzad

    Abstract: The traffic in today's networks is increasingly influenced by the interactions among network nodes as well as by the temporal fluctuations in the demands of the nodes. Traditional statistical prediction methods are becoming obsolete due to their inability to address the non-linear and dynamic spatio-temporal dependencies present in today's network traffic. The most promising direction of research… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  23. arXiv:2505.05621  [pdf, other

    cs.CV

    A Preliminary Study for GPT-4o on Image Restoration

    Authors: Hao Yang, Yan Yang, Ruikun Zhang, Liyuan Pan

    Abstract: OpenAI's GPT-4o model, integrating multi-modal inputs and outputs within an autoregressive architecture, has demonstrated unprecedented performance in image generation. In this work, we investigate its potential impact on the image restoration community. We present the first systematic evaluation of GPT-4o across diverse restoration tasks. Our experiments reveal that, although restoration outputs… ▽ More

    Submitted 17 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

  24. arXiv:2505.05056  [pdf, other

    cs.CL cs.AI

    Teochew-Wild: The First In-the-wild Teochew Dataset with Orthographic Annotations

    Authors: Linrong Pan, Chenglong Jiang, Gaoze Hou, Ying Gao

    Abstract: This paper reports the construction of the Teochew-Wild, a speech corpus of the Teochew dialect. The corpus includes 18.9 hours of in-the-wild Teochew speech data from multiple speakers, covering both formal and colloquial expressions, with precise orthographic and pinyin annotations. Additionally, we provide supplementary text processing tools and resources to propel research and applications in… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  25. arXiv:2505.03655  [pdf, other

    cs.IR cs.AI

    Counterfactual Inference for Eliminating Sentiment Bias in Recommender Systems

    Authors: Le Pan, Yuanjiang Cao, Chengkai Huang, Wenjie Zhang, Lina Yao

    Abstract: Recommender Systems (RSs) aim to provide personalized recommendations for users. A newly discovered bias, known as sentiment bias, uncovers a common phenomenon within Review-based RSs (RRSs): the recommendation accuracy of users or items with negative reviews deteriorates compared with users or items with positive reviews. Critical users and niche items are disadvantaged by such unfair recommendat… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  26. arXiv:2505.03645  [pdf, other

    quant-ph cond-mat.dis-nn

    Expedited thermalization dynamics in incommensurate systems

    Authors: Mingdi Xu, Zijun Wei, Xiang-Ping Jiang, Lei Pan

    Abstract: We study the thermalization dynamics of a quantum system embedded in an incommensurate potential and coupled to a Markovian thermal reservoir. The dephasing induced by the bath drives the system toward an infinite-temperature steady state, erasing all initial information-including signatures of localization. We find that initially localized states can relax to the homogeneous steady state faster t… ▽ More

    Submitted 25 May, 2025; v1 submitted 6 May, 2025; originally announced May 2025.

    Comments: 8 pages, 4 figures, comments are welcome

  27. arXiv:2504.21737  [pdf

    cond-mat.mtrl-sci cond-mat.mes-hall

    Observation of Intrinsic and LED Light-Enhanced Memristor Performance in In-Plane Ferroelectric NbOI2

    Authors: Zheng Hao, Gaolei Zhao, Haoran Li, Jizhang Zhang, Jiabin Liu, Fanyi Kong, Konstantin Kozadaev, Yongjiang Li, Xue Han, Hong Li, Huolin Huang, Changsen Sun, Alexei Tolstik, Andrey Novitsky, Lujun Pan, Dawei Li

    Abstract: Two-dimensional (2D) layered ferroelectrics, as an emerging area of research, have attracted extensive attention, while memristors based on new 2D ferroelectric materials have yet to be fully explored, thereby limiting their applications in modern nanoelectronics. In this work, we report the observation of intrinsic memristive behavior in a newly discovered 2D in-plane ferroelectric material, NbOI… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

    Comments: 20 pages, 5 figures

  28. arXiv:2504.14905  [pdf, other

    cs.CL

    CRAVE: A Conflicting Reasoning Approach for Explainable Claim Verification Using LLMs

    Authors: Yingming Zheng, Xiaoliang Liu, Peng Wu, Li Pan

    Abstract: The rapid spread of misinformation, driven by digital media and AI-generated content, has made automatic claim verification essential. Traditional methods, which depend on expert-annotated evidence, are labor-intensive and not scalable. Although recent automated systems have improved, they still struggle with complex claims that require nuanced reasoning. To address this, we propose CRAVE, a Confl… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  29. arXiv:2504.14238  [pdf, other

    cs.CV

    Single Document Image Highlight Removal via A Large-Scale Real-World Dataset and A Location-Aware Network

    Authors: Lu Pan, Yu-Hsuan Huang, Hongxia Xie, Cheng Zhang, Hongwei Zhao, Hong-Han Shuai, Wen-Huang Cheng

    Abstract: Reflective documents often suffer from specular highlights under ambient lighting, severely hindering text readability and degrading overall visual quality. Although recent deep learning methods show promise in highlight removal, they remain suboptimal for document images, primarily due to the lack of dedicated datasets and tailored architectural designs. To tackle these challenges, we present Doc… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

    Comments: main paper with 8 pages, conference

  30. arXiv:2504.13407  [pdf, other

    cs.CV cs.AI

    LoRA-Based Continual Learning with Constraints on Critical Parameter Changes

    Authors: Shimou Ling, Liang Zhang, Jiangwei Zhao, Lili Pan, Hongliang Li

    Abstract: LoRA-based continual learning represents a promising avenue for leveraging pre-trained models in downstream continual learning tasks. Recent studies have shown that orthogonal LoRA tuning effectively mitigates forgetting. However, this work unveils that under orthogonal LoRA tuning, the critical parameters for pre-tasks still change notably after learning post-tasks. To address this problem, we di… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  31. arXiv:2504.12711  [pdf, other

    cs.CV cs.AI eess.IV

    NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results

    Authors: Xin Li, Yeying Jin, Xin Jin, Zongwei Wu, Bingchen Li, Yufei Wang, Wenhan Yang, Yu Li, Zhibo Chen, Bihan Wen, Robby T. Tan, Radu Timofte, Qiyu Rong, Hongyuan Jing, Mengmeng Zhang, Jinglong Li, Xiangyu Lu, Yi Ren, Yuting Liu, Meng Zhang, Xiang Chen, Qiyuan Guan, Jiangxin Dong, Jinshan Pan, Conglin Gou , et al. (112 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images. This challenge received a wide range of impressive solutions, which are developed and evaluated using our collected real-world Raindrop Clarity dataset. Unlike existing deraining datasets, our Raindrop Clarity dataset is more diverse and challenging in degradation types and contents, which includ… ▽ More

    Submitted 19 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

    Comments: Challenge Report of CVPR NTIRE 2025; 26 pages; Methods from 32 teams

  32. arXiv:2504.12709  [pdf, other

    cs.CV

    Self-Supervised Pre-training with Combined Datasets for 3D Perception in Autonomous Driving

    Authors: Shumin Wang, Zhuoran Yang, Lidian Wang, Zhipeng Tang, Heng Li, Lehan Pan, Sha Zhang, Jie Peng, Jianmin Ji, Yanyong Zhang

    Abstract: The significant achievements of pre-trained models leveraging large volumes of data in the field of NLP and 2D vision inspire us to explore the potential of extensive data pre-training for 3D perception in autonomous driving. Toward this goal, this paper proposes to utilize massive unlabeled data from heterogeneous datasets to pre-train 3D perception models. We introduce a self-supervised pre-trai… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  33. arXiv:2504.09993  [pdf, other

    cs.LG

    AimTS: Augmented Series and Image Contrastive Learning for Time Series Classification

    Authors: Yuxuan Chen, Shanshan Huang, Yunyao Cheng, Peng Chen, Zhongwen Rao, Yang Shu, Bin Yang, Lujia Pan, Chenjuan Guo

    Abstract: Time series classification (TSC) is an important task in time series analysis. Existing TSC methods mainly train on each single domain separately, suffering from a degradation in accuracy when the samples for training are insufficient in certain domains. The pre-training and fine-tuning paradigm provides a promising direction for solving this problem. However, time series from different domains ar… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  34. arXiv:2504.09444  [pdf, other

    quant-ph cond-mat.mes-hall cond-mat.quant-gas

    Dissipation induced localization-delocalization transition in a flat band

    Authors: Mingdi Xu, Zijun Wei, Xiang-Ping Jiang, Lei Pan

    Abstract: The interplay between dissipation and localization in quantum systems has garnered significant attention due to its potential to manipulate transport properties and induce phase transitions. In this work, we explore the dissipation-induced extended-localized transition in a flat band model, where the system's asymptotic state can be controlled by tailored dissipative operators. By analyzing the st… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

    Comments: 10 pages, 9 figures, comments are welcome

  35. arXiv:2503.23874  [pdf

    cond-mat.mtrl-sci cond-mat.other physics.chem-ph physics.comp-ph

    He-Mg compounds and helium-driven nonmetal transition in metallic magnesium

    Authors: Y. S. Huang, H. X. Song, Q. D. Hao, X. L. Pan, D. Wang, H. Wang, Y. F. Wang, Y. Sun, Hua Y. Geng

    Abstract: The polymorphism and mechanism of helium compounds is crucial for understanding the physical and chemical nature of He-bearing materials under pressures. Here, we predict two new types of He-bearing compounds, MgHe and MgnHe (n = 6, 8, 10, 15, 18), being formed above 750 GPa by unbiased ab initio structure search. An unexpected bandgap is opened up in MgHe at as low as around 200 GPa. This is the… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

    Comments: 22 pages, 5 figures, with supporting materials

    Journal ref: Phys. Rev. B 110, 214102 (2024)

  36. arXiv:2503.19912  [pdf, other

    cs.CV cs.LG cs.RO

    SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining

    Authors: Xiang Xu, Lingdong Kong, Hui Shuai, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Qingshan Liu

    Abstract: LiDAR representation learning has emerged as a promising approach to reducing reliance on costly and labor-intensive human annotations. While existing methods primarily focus on spatial alignment between LiDAR and camera sensors, they often overlook the temporal dynamics critical for capturing motion and scene continuity in driving scenarios. To address this limitation, we propose SuperFlow++, a n… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: Preprint; 15 pages, 6 figures, 10 tables; Code at https://github.com/Xiangxu-0103/SuperFlow

  37. arXiv:2503.19901  [pdf, other

    cs.CV

    TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization

    Authors: Liang Pan, Zeshi Yang, Zhiyang Dou, Wenjia Wang, Buzhen Huang, Bo Dai, Taku Komura, Jingbo Wang

    Abstract: Synthesizing diverse and physically plausible Human-Scene Interactions (HSI) is pivotal for both computer animation and embodied AI. Despite encouraging progress, current methods mainly focus on developing separate controllers, each specialized for a specific interaction task. This significantly hinders the ability to tackle a wide variety of challenging HSI tasks that require the integration of m… ▽ More

    Submitted 3 April, 2025; v1 submitted 25 March, 2025; originally announced March 2025.

    Comments: CVPR 2025

  38. arXiv:2503.15573  [pdf, other

    cs.LG

    Task-Specific Data Selection for Instruction Tuning via Monosemantic Neuronal Activations

    Authors: Da Ma, Gonghu Shang, Zhi Chen, Libo Qin, Yijie Luo, Lei Pan, Shuai Fan, Lu Chen, Kai Yu

    Abstract: Instruction tuning improves the ability of large language models (LLMs) to follow diverse human instructions, but achieving strong performance on specific target tasks remains challenging. A critical bottleneck is selecting the most relevant data to maximize task-specific performance. Existing data selection approaches include unstable influence-based methods and more stable distribution alignment… ▽ More

    Submitted 16 May, 2025; v1 submitted 19 March, 2025; originally announced March 2025.

    Comments: preprint, (20 pages, 7 figures, 13 tables)

  39. arXiv:2503.15451  [pdf, other

    cs.CV

    MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space

    Authors: Lixing Xiao, Shunlin Lu, Huaijin Pi, Ke Fan, Liang Pan, Yueer Zhou, Ziyong Feng, Xiaowei Zhou, Sida Peng, Jingbo Wang

    Abstract: This paper addresses the challenge of text-conditioned streaming motion generation, which requires us to predict the next-step human pose based on variable-length historical motions and incoming texts. Existing methods struggle to achieve streaming motion generation, e.g., diffusion models are constrained by pre-defined motion lengths, while GPT-based methods suffer from delayed response and error… ▽ More

    Submitted 16 April, 2025; v1 submitted 19 March, 2025; originally announced March 2025.

    Comments: Project Page: https://zju3dv.github.io/MotionStreamer/

  40. arXiv:2503.13799  [pdf, other

    cs.CV cs.AI

    SMILE: a Scale-aware Multiple Instance Learning Method for Multicenter STAS Lung Cancer Histopathology Diagnosis

    Authors: Liangrui Pan, Xiaoyu Li, Yutao Dou, Qiya Song, Jiadi Luo, Qingchun Liang, Shaoliang Peng

    Abstract: Spread through air spaces (STAS) represents a newly identified aggressive pattern in lung cancer, which is known to be associated with adverse prognostic factors and complex pathological features. Pathologists currently rely on time consuming manual assessments, which are highly subjective and prone to variation. This highlights the urgent need for automated and precise diag nostic solutions. 2,97… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  41. arXiv:2503.11117  [pdf, other

    cs.CV

    Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering

    Authors: Kaixuan Jiang, Yang Liu, Weixing Chen, Jingzhou Luo, Ziliang Chen, Ling Pan, Guanbin Li, Liang Lin

    Abstract: Embodied Question Answering (EQA) is a challenging task in embodied intelligence that requires agents to dynamically explore 3D environments, actively gather visual information, and perform multi-step reasoning to answer questions. However, current EQA approaches suffer from critical limitations in exploration efficiency, dataset design, and evaluation metrics. Moreover, existing datasets often in… ▽ More

    Submitted 23 May, 2025; v1 submitted 14 March, 2025; originally announced March 2025.

  42. arXiv:2503.06885  [pdf, other

    cs.CV

    ProBench: Judging Multimodal Foundation Models on Open-ended Multi-domain Expert Tasks

    Authors: Yan Yang, Dongxu Li, Haoning Wu, Bei Chen, Liu Liu, Liyuan Pan, Junnan Li

    Abstract: Solving expert-level multimodal tasks is a key milestone towards general intelligence. As the capabilities of multimodal large language models (MLLMs) continue to improve, evaluation of such advanced multimodal intelligence becomes necessary yet challenging. In this work, we introduce ProBench, a benchmark of open-ended user queries that require professional expertise and advanced reasoning. ProBe… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

  43. arXiv:2503.06708  [pdf, other

    cs.CL

    Alignment for Efficient Tool Calling of Large Language Models

    Authors: Hongshen Xu, Zihan Wang, Zichen Zhu, Lei Pan, Xingyu Chen, Lu Chen, Kai Yu

    Abstract: Recent advancements in tool learning have enabled large language models (LLMs) to integrate external tools, enhancing their task performance by expanding their knowledge boundaries. However, relying on tools often introduces tradeoffs between performance, speed, and cost, with LLMs sometimes exhibiting overreliance and overconfidence in tool usage. This paper addresses the challenge of aligning LL… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

  44. arXiv:2503.05185  [pdf, other

    q-fin.CP cs.AI cs.MM

    FinTMMBench: Benchmarking Temporal-Aware Multi-Modal RAG in Finance

    Authors: Fengbin Zhu, Junfeng Li, Liangming Pan, Wenjie Wang, Fuli Feng, Chao Wang, Huanbo Luan, Tat-Seng Chua

    Abstract: Finance decision-making often relies on in-depth data analysis across various data sources, including financial tables, news articles, stock prices, etc. In this work, we introduce FinTMMBench, the first comprehensive benchmark for evaluating temporal-aware multi-modal Retrieval-Augmented Generation (RAG) systems in finance. Built from heterologous data of NASDAQ 100 companies, FinTMMBench offers… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

    Comments: Under review

  45. arXiv:2503.05120  [pdf, ps, other

    astro-ph.IM astro-ph.GA astro-ph.SR physics.chem-ph

    Computing Anharmonic Infrared Spectra of Polycyclic Aromatic Hydrocarbons Using Machine-Learning Molecular Dynamics

    Authors: Xinghong Mai, Zhao Wang, Lijun Pan, Johannes Schorghuber, Peter Kovacs, Jesus Carrete, Georg K. H. Madsen

    Abstract: We introduce a machine learning molecular dynamics (MLMD) approach to calculate the anharmonic infrared (IR) absorption spectra of polycyclic aromatic hydrocarbons (PAHs), key carriers of interstellar aromatic IR bands. This method accounts for temperature effects in a molecule-specific way and achieves accuracy comparable to conventional quantum chemical calculations at a fraction of the cost, sc… ▽ More

    Submitted 30 June, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

  46. arXiv:2503.01347  [pdf, other

    cs.CV

    Spatial Transcriptomics Analysis of Spatially Dense Gene Expression Prediction

    Authors: Ruikun Zhang, Yan Yang, Liyuan Pan

    Abstract: Spatial transcriptomics (ST) measures gene expression at fine-grained spatial resolution, offering insights into tissue molecular landscapes. Previous methods for spatial gene expression prediction usually crop spots of interest from pathology tissue slide images, and learn a model that maps each spot to a single gene expression profile. However, it fundamentally loses spatial resolution of gene e… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  47. arXiv:2502.18040  [pdf, other

    cs.SI cs.AI

    AutoCas: Autoregressive Cascade Predictor in Social Networks via Large Language Models

    Authors: Yuhao Zheng, Chenghua Gong, Rui Sun, Juyuan Zhang, Liming Pan, Linyuan Lv

    Abstract: Popularity prediction in information cascades plays a crucial role in social computing, with broad applications in viral marketing, misinformation control, and content recommendation. However, information propagation mechanisms, user behavior, and temporal activity patterns exhibit significant diversity, necessitating a foundational model capable of adapting to such variations. At the same time, t… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

    Comments: 12 pages

  48. arXiv:2502.17494  [pdf, other

    cs.IR cs.AI cs.LG

    External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation

    Authors: Mingfu Liang, Xi Liu, Rong Jin, Boyang Liu, Qiuling Suo, Qinghai Zhou, Song Zhou, Laming Chen, Hua Zheng, Zhiyuan Li, Shali Jiang, Jiyan Yang, Xiaozhen Xia, Fan Yang, Yasmine Badr, Ellie Wen, Shuyu Xu, Hansey Chen, Zhengyu Zhang, Jade Nie, Chunzhi Yang, Zhichen Zeng, Weilin Zhang, Xingliang Huang, Qianru Li , et al. (80 additional authors not shown)

    Abstract: Ads recommendation is a prominent service of online advertising systems and has been actively studied. Recent studies indicate that scaling-up and advanced design of the recommendation model can bring significant performance improvement. However, with a larger model scale, such prior studies have a significantly increasing gap from industry as they often neglect two fundamental challenges in indus… ▽ More

    Submitted 23 April, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

    Comments: Accepted by the ACM Web Conference (WWW) 2025 Industrial Track as Oral Presentation

  49. arXiv:2502.15823  [pdf, other

    cs.LG cs.AI cs.CL cs.FL

    InductionBench: LLMs Fail in the Simplest Complexity Class

    Authors: Wenyue Hua, Tyler Wong, Sun Fei, Liangming Pan, Adam Jardine, William Yang Wang

    Abstract: Large language models (LLMs) have shown remarkable improvements in reasoning and many existing benchmarks have been addressed by models such as o1 and o3 either fully or partially. However, a majority of these benchmarks emphasize deductive reasoning, including mathematical and coding tasks in which rules such as mathematical axioms or programming syntax are clearly defined, based on which LLMs ca… ▽ More

    Submitted 13 May, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

    Comments: 25 pages, 10 figures, more details including examples and prompts are added

  50. arXiv:2502.15637  [pdf, other

    cs.LG cs.AI stat.ML

    Mantis: Lightweight Calibrated Foundation Model for User-Friendly Time Series Classification

    Authors: Vasilii Feofanov, Songkang Wen, Marius Alonso, Romain Ilbert, Hongbo Guo, Malik Tiomoko, Lujia Pan, Jianfeng Zhang, Ievgen Redko

    Abstract: In recent years, there has been increasing interest in developing foundation models for time series data that can generalize across diverse downstream tasks. While numerous forecasting-oriented foundation models have been introduced, there is a notable scarcity of models tailored for time series classification. To address this gap, we present Mantis, a new open-source foundation model for time ser… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.