-
TransGI: Real-Time Dynamic Global Illumination With Object-Centric Neural Transfer Model
Authors:
Yijie Deng,
Lei Han,
Lu Fang
Abstract:
Neural rendering algorithms have revolutionized computer graphics, yet their impact on real-time rendering under arbitrary lighting conditions remains limited due to strict latency constraints in practical applications. The key challenge lies in formulating a compact yet expressive material representation. To address this, we propose TransGI, a novel neural rendering method for real-time, high-fid…
▽ More
Neural rendering algorithms have revolutionized computer graphics, yet their impact on real-time rendering under arbitrary lighting conditions remains limited due to strict latency constraints in practical applications. The key challenge lies in formulating a compact yet expressive material representation. To address this, we propose TransGI, a novel neural rendering method for real-time, high-fidelity global illumination. It comprises an object-centric neural transfer model for material representation and a radiance-sharing lighting system for efficient illumination. Traditional BSDF representations and spatial neural material representations lack expressiveness, requiring thousands of ray evaluations to converge to noise-free colors. Conversely, real-time methods trade quality for efficiency by supporting only diffuse materials. In contrast, our object-centric neural transfer model achieves compactness and expressiveness through an MLP-based decoder and vertex-attached latent features, supporting glossy effects with low memory overhead. For dynamic, varying lighting conditions, we introduce local light probes capturing scene radiance, coupled with an across-probe radiance-sharing strategy for efficient probe generation. We implemented our method in a real-time rendering engine, combining compute shaders and CUDA-based neural networks. Experimental results demonstrate that our method achieves real-time performance of less than 10 ms to render a frame and significantly improved rendering quality compared to baseline methods.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
A Call for Collaborative Intelligence: Why Human-Agent Systems Should Precede AI Autonomy
Authors:
Henry Peng Zou,
Wei-Chieh Huang,
Yaozu Wu,
Chunyu Miao,
Dongyuan Li,
Aiwei Liu,
Yue Zhou,
Yankai Chen,
Weizhi Zhang,
Yangning Li,
Liancheng Fang,
Renhe Jiang,
Philip S. Yu
Abstract:
Recent improvements in large language models (LLMs) have led many researchers to focus on building fully autonomous AI agents. This position paper questions whether this approach is the right path forward, as these autonomous systems still have problems with reliability, transparency, and understanding the actual requirements of human. We suggest a different approach: LLM-based Human-Agent Systems…
▽ More
Recent improvements in large language models (LLMs) have led many researchers to focus on building fully autonomous AI agents. This position paper questions whether this approach is the right path forward, as these autonomous systems still have problems with reliability, transparency, and understanding the actual requirements of human. We suggest a different approach: LLM-based Human-Agent Systems (LLM-HAS), where AI works with humans rather than replacing them. By keeping human involved to provide guidance, answer questions, and maintain control, these systems can be more trustworthy and adaptable. Looking at examples from healthcare, finance, and software development, we show how human-AI teamwork can handle complex tasks better than AI working alone. We also discuss the challenges of building these collaborative systems and offer practical solutions. This paper argues that progress in AI should not be measured by how independent systems become, but by how well they can work with humans. The most promising future for AI is not in systems that take over human roles, but in those that enhance human capabilities through meaningful partnership.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
Micro-Macro Modeling of Polymeric Fluids with Multi-Bead Polymer Chain
Authors:
Xuelian Bao,
Lidong Fang,
Huaxiong Huang,
Zilong Song,
Shixin Xu
Abstract:
This work extends the classical dumbbell (two-bead) model of polymer chains to a more detailed multi-bead representation, where each polymer chain consists of $N$ beads connected by $N-1$ springs. We develop a thermodynamically consistent micro-macro model based on the energy variational method to describe the coupled dynamics of polymer configurations and fluid flow. The resulting framework captu…
▽ More
This work extends the classical dumbbell (two-bead) model of polymer chains to a more detailed multi-bead representation, where each polymer chain consists of $N$ beads connected by $N-1$ springs. We develop a thermodynamically consistent micro-macro model based on the energy variational method to describe the coupled dynamics of polymer configurations and fluid flow. The resulting framework captures complex microscopic behaviors, such as bond stretching and alignment under flow, and links them to macroscopic stress responses.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
NR4DER: Neural Re-ranking for Diversified Exercise Recommendation
Authors:
Xinghe Cheng,
Xufang Zhou,
Liangda Fang,
Chaobo He,
Yuyu Zhou,
Weiqi Luo,
Zhiguo Gong,
Quanlong Guan
Abstract:
With the widespread adoption of online education platforms, an increasing number of students are gaining new knowledge through Massive Open Online Courses (MOOCs). Exercise recommendation have made strides toward improving student learning outcomes. However, existing methods not only struggle with high dropout rates but also fail to match the diverse learning pace of students. They frequently face…
▽ More
With the widespread adoption of online education platforms, an increasing number of students are gaining new knowledge through Massive Open Online Courses (MOOCs). Exercise recommendation have made strides toward improving student learning outcomes. However, existing methods not only struggle with high dropout rates but also fail to match the diverse learning pace of students. They frequently face difficulties in adjusting to inactive students' learning patterns and in accommodating individualized learning paces, resulting in limited accuracy and diversity in recommendations. To tackle these challenges, we propose Neural Re-ranking for Diversified Exercise Recommendation (in short, NR4DER). NR4DER first leverages the mLSTM model to improve the effectiveness of the exercise filter module. It then employs a sequence enhancement method to enhance the representation of inactive students, accurately matches students with exercises of appropriate difficulty. Finally, it utilizes neural re-ranking to generate diverse recommendation lists based on individual students' learning histories. Extensive experimental results indicate that NR4DER significantly outperforms existing methods across multiple real-world datasets and effectively caters to the diverse learning pace of students.
△ Less
Submitted 1 June, 2025;
originally announced June 2025.
-
Towards Effective Code-Integrated Reasoning
Authors:
Fei Bai,
Yingqian Min,
Beichen Zhang,
Zhipeng Chen,
Wayne Xin Zhao,
Lei Fang,
Zheng Liu,
Zhongyuan Wang,
Ji-Rong Wen
Abstract:
In this paper, we investigate code-integrated reasoning, where models generate code when necessary and integrate feedback by executing it through a code interpreter. To acquire this capability, models must learn when and how to use external code tools effectively, which is supported by tool-augmented reinforcement learning (RL) through interactive learning. Despite its benefits, tool-augmented RL…
▽ More
In this paper, we investigate code-integrated reasoning, where models generate code when necessary and integrate feedback by executing it through a code interpreter. To acquire this capability, models must learn when and how to use external code tools effectively, which is supported by tool-augmented reinforcement learning (RL) through interactive learning. Despite its benefits, tool-augmented RL can still suffer from potential instability in the learning dynamics. In light of this challenge, we present a systematic approach to improving the training effectiveness and stability of tool-augmented RL for code-integrated reasoning. Specifically, we develop enhanced training strategies that balance exploration and stability, progressively building tool-use capabilities while improving reasoning performance. Through extensive experiments on five mainstream mathematical reasoning benchmarks, our model demonstrates significant performance improvements over multiple competitive baselines. Furthermore, we conduct an in-depth analysis of the mechanism and effect of code-integrated reasoning, revealing several key insights, such as the extension of model's capability boundaries and the simultaneous improvement of reasoning efficiency through code integration. All data and code for reproducing this work are available at: https://github.com/RUCAIBox/CIR.
△ Less
Submitted 30 May, 2025;
originally announced May 2025.
-
MUSE: Model-Agnostic Tabular Watermarking via Multi-Sample Selection
Authors:
Liancheng Fang,
Aiwei Liu,
Henry Peng Zou,
Yankai Chen,
Hengrui Zhang,
Zhongfen Deng,
Philip S. Yu
Abstract:
We introduce MUSE, a watermarking algorithm for tabular generative models. Previous approaches typically leverage DDIM invertibility to watermark tabular diffusion models, but tabular diffusion models exhibit significantly poorer invertibility compared to other modalities, compromising performance. Simultaneously, tabular diffusion models require substantially less computation than other modalitie…
▽ More
We introduce MUSE, a watermarking algorithm for tabular generative models. Previous approaches typically leverage DDIM invertibility to watermark tabular diffusion models, but tabular diffusion models exhibit significantly poorer invertibility compared to other modalities, compromising performance. Simultaneously, tabular diffusion models require substantially less computation than other modalities, enabling a multi-sample selection approach to tabular generative model watermarking. MUSE embeds watermarks by generating multiple candidate samples and selecting one based on a specialized scoring function, without relying on model invertibility. Our theoretical analysis establishes the relationship between watermark detectability, candidate count, and dataset size, allowing precise calibration of watermarking strength. Extensive experiments demonstrate that MUSE achieves state-of-the-art watermark detectability and robustness against various attacks while maintaining data quality, and remains compatible with any tabular generative model supporting repeated sampling, effectively addressing key challenges in tabular data watermarking. Specifically, it reduces the distortion rates on fidelity metrics by 81-89%, while achieving a 1.0 [email protected]%FPR detection rate. Implementation of MUSE can be found at https://github.com/fangliancheng/MUSE.
△ Less
Submitted 30 May, 2025;
originally announced May 2025.
-
Interturn Fault Detection in IPMSMs: Two Adaptive Observer-based Solutions
Authors:
Romeo Ortega,
Alexey Bobtsov,
Leyan Fang,
Oscar Texis-Loaiza,
Johannes Schiffer
Abstract:
In this paper we address the problem of online detection of inter-turn short-circuit faults (ITSCFs) that occur in permanent magnet synchronous motors (PMSMs). We propose two solutions to this problem: (i) a very simple linear observer and (ii) a generalized parameter estimation based observer, that incorporates a high performance estimator -- with both observers detecting the short-circuit curren…
▽ More
In this paper we address the problem of online detection of inter-turn short-circuit faults (ITSCFs) that occur in permanent magnet synchronous motors (PMSMs). We propose two solutions to this problem: (i) a very simple linear observer and (ii) a generalized parameter estimation based observer, that incorporates a high performance estimator -- with both observers detecting the short-circuit current and the fault intensity. Although the first solution guarantees the detection of the fault exponentially fast, the rate of convergence is fully determined by the motor parameters that, in some cases, may be too slow. The second observer, on the other hand, ensures finite convergence time under the weakest assumption of interval excitation. To make the observers adaptive, we develop a parameter estimator that, in the case of isotropic PMSMs, estimates on-line (exponentially fast) the resistance and inductance of the motor. It should be underscored that, in contrast with existing observers (including the widely popular Kalman filter) that provide indirect information of the fault current, our observers provide explicit one -- namely the amplitude of the fault current. The performance of both observers, in their linear and generalized parameter estimation-based versions, is illustrated with realistic simulation studies.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Voltage Control of the Boost Converter: PI vs. Nonlinear Passivity-based Control
Authors:
Leyan Fang,
Romeo Ortega,
Robert Griñó
Abstract:
We carry-out a detailed analysis of direct voltage control of a Boost converter feeding a simple resistive load. First, we prove that using a classical PI control to stabilize a desired equilibrium leads to a very complicated dynamic behavior consisting of two equilibrium points, one of them always unstable for all PI gains and circuit parameter values. Interestingly, the second equilibrium point…
▽ More
We carry-out a detailed analysis of direct voltage control of a Boost converter feeding a simple resistive load. First, we prove that using a classical PI control to stabilize a desired equilibrium leads to a very complicated dynamic behavior consisting of two equilibrium points, one of them always unstable for all PI gains and circuit parameter values. Interestingly, the second equilibrium point may be rendered stable -- but for all tuning gains leading to an extremely large value of the circuit current and the controller integrator state. Moreover, if we neglect the resistive effect of the inductor, there is only one equilibrium and it is always unstable. From a practical point of view, it is important to note that the only useful equilibrium point is that of minimum current and that, in addition, there is always a resistive component in the inductor either by its parasitic resistance or by the resistive component of the output impedance of the previous stage. In opposition to this troublesome scenario we recall three nonlinear voltage-feedback controllers, that ensure asymptotic stability of the desired equilibrium with simple gain tuning rules, an easily defined domain of attraction and smooth transient behavior. Two of them are very simple, nonlinear, static voltage feedback rules, while the third one is a variation of the PID scheme called PID-Passivity-based Control (PBC). In its original formulation PID-PBC requires full state measurement, but we present a modified version that incorporates a current observer. All three nonlinear controllers are designed following the principles of PBC, which has had enormous success in many engineering applications.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
GoLF-NRT: Integrating Global Context and Local Geometry for Few-Shot View Synthesis
Authors:
You Wang,
Li Fang,
Hao Zhu,
Fei Hu,
Long Ye,
Zhan Ma
Abstract:
Neural Radiance Fields (NeRF) have transformed novel view synthesis by modeling scene-specific volumetric representations directly from images. While generalizable NeRF models can generate novel views across unknown scenes by learning latent ray representations, their performance heavily depends on a large number of multi-view observations. However, with limited input views, these methods experien…
▽ More
Neural Radiance Fields (NeRF) have transformed novel view synthesis by modeling scene-specific volumetric representations directly from images. While generalizable NeRF models can generate novel views across unknown scenes by learning latent ray representations, their performance heavily depends on a large number of multi-view observations. However, with limited input views, these methods experience significant degradation in rendering quality. To address this limitation, we propose GoLF-NRT: a Global and Local feature Fusion-based Neural Rendering Transformer. GoLF-NRT enhances generalizable neural rendering from few input views by leveraging a 3D transformer with efficient sparse attention to capture global scene context. In parallel, it integrates local geometric features extracted along the epipolar line, enabling high-quality scene reconstruction from as few as 1 to 3 input views. Furthermore, we introduce an adaptive sampling strategy based on attention weights and kernel regression, improving the accuracy of transformer-based neural rendering. Extensive experiments on public datasets show that GoLF-NRT achieves state-of-the-art performance across varying numbers of input views, highlighting the effectiveness and superiority of our approach. Code is available at https://github.com/KLMAV-CUC/GoLF-NRT.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
Depth-Guided Bundle Sampling for Efficient Generalizable Neural Radiance Field Reconstruction
Authors:
Li Fang,
Hao Zhu,
Longlong Chen,
Fei Hu,
Long Ye,
Zhan Ma
Abstract:
Recent advancements in generalizable novel view synthesis have achieved impressive quality through interpolation between nearby views. However, rendering high-resolution images remains computationally intensive due to the need for dense sampling of all rays. Recognizing that natural scenes are typically piecewise smooth and sampling all rays is often redundant, we propose a novel depth-guided bund…
▽ More
Recent advancements in generalizable novel view synthesis have achieved impressive quality through interpolation between nearby views. However, rendering high-resolution images remains computationally intensive due to the need for dense sampling of all rays. Recognizing that natural scenes are typically piecewise smooth and sampling all rays is often redundant, we propose a novel depth-guided bundle sampling strategy to accelerate rendering. By grouping adjacent rays into a bundle and sampling them collectively, a shared representation is generated for decoding all rays within the bundle. To further optimize efficiency, our adaptive sampling strategy dynamically allocates samples based on depth confidence, concentrating more samples in complex regions while reducing them in smoother areas. When applied to ENeRF, our method achieves up to a 1.27 dB PSNR improvement and a 47% increase in FPS on the DTU dataset. Extensive experiments on synthetic and real-world datasets demonstrate state-of-the-art rendering quality and up to 2x faster rendering compared to existing generalizable methods. Code is available at https://github.com/KLMAV-CUC/GDB-NeRF.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning
Authors:
Huatong Song,
Jinhao Jiang,
Wenqing Tian,
Zhipeng Chen,
Yuhuan Wu,
Jiahao Zhao,
Yingqian Min,
Wayne Xin Zhao,
Lei Fang,
Ji-Rong Wen
Abstract:
Large Language Models (LLMs) are powerful but prone to hallucinations due to static knowledge. Retrieval-Augmented Generation (RAG) helps by injecting external information, but current methods often are costly, generalize poorly, or ignore the internal knowledge of the model. In this paper, we introduce R1-Searcher++, a novel framework designed to train LLMs to adaptively leverage both internal an…
▽ More
Large Language Models (LLMs) are powerful but prone to hallucinations due to static knowledge. Retrieval-Augmented Generation (RAG) helps by injecting external information, but current methods often are costly, generalize poorly, or ignore the internal knowledge of the model. In this paper, we introduce R1-Searcher++, a novel framework designed to train LLMs to adaptively leverage both internal and external knowledge sources. R1-Searcher++ employs a two-stage training strategy: an initial SFT Cold-start phase for preliminary format learning, followed by RL for Dynamic Knowledge Acquisition. The RL stage uses outcome-supervision to encourage exploration, incorporates a reward mechanism for internal knowledge utilization, and integrates a memorization mechanism to continuously assimilate retrieved information, thereby enriching the model's internal knowledge. By leveraging internal knowledge and external search engine, the model continuously improves its capabilities, enabling efficient retrieval-augmented reasoning. Our experiments demonstrate that R1-Searcher++ outperforms previous RAG and reasoning methods and achieves efficient retrieval. The code is available at https://github.com/RUCAIBox/R1-Searcher-plus.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis
Authors:
Shuang Sun,
Huatong Song,
Yuhao Wang,
Ruiyang Ren,
Jinhao Jiang,
Junjie Zhang,
Fei Bai,
Jia Deng,
Wayne Xin Zhao,
Zheng Liu,
Lei Fang,
Zhongyuan Wang,
Ji-Rong Wen
Abstract:
Retrieval-augmented generation (RAG) systems have advanced large language models (LLMs) in complex deep search scenarios requiring multi-step reasoning and iterative information retrieval. However, existing approaches face critical limitations that lack high-quality training trajectories or suffer from the distributional mismatches in simulated environments and prohibitive computational costs for…
▽ More
Retrieval-augmented generation (RAG) systems have advanced large language models (LLMs) in complex deep search scenarios requiring multi-step reasoning and iterative information retrieval. However, existing approaches face critical limitations that lack high-quality training trajectories or suffer from the distributional mismatches in simulated environments and prohibitive computational costs for real-world deployment. This paper introduces SimpleDeepSearcher, a lightweight yet effective framework that bridges this gap through strategic data engineering rather than complex training paradigms. Our approach synthesizes high-quality training data by simulating realistic user interactions in live web search environments, coupled with a multi-criteria curation strategy that optimizes the diversity and quality of input and output side. Experiments on five benchmarks across diverse domains demonstrate that SFT on only 871 curated samples yields significant improvements over RL-based baselines. Our work establishes SFT as a viable pathway by systematically addressing the data-scarce bottleneck, offering practical insights for efficient deep search systems. Our code is available at https://github.com/RUCAIBox/SimpleDeepSearcher.
△ Less
Submitted 25 May, 2025; v1 submitted 22 May, 2025;
originally announced May 2025.
-
IPENS:Interactive Unsupervised Framework for Rapid Plant Phenotyping Extraction via NeRF-SAM2 Fusion
Authors:
Wentao Song,
He Huang,
Youqiang Sun,
Fang Qu,
Jiaqi Zhang,
Longhui Fang,
Yuwei Hao,
Chenyang Peng
Abstract:
Advanced plant phenotyping technologies play a crucial role in targeted trait improvement and accelerating intelligent breeding. Due to the species diversity of plants, existing methods heavily rely on large-scale high-precision manually annotated data. For self-occluded objects at the grain level, unsupervised methods often prove ineffective. This study proposes IPENS, an interactive unsupervised…
▽ More
Advanced plant phenotyping technologies play a crucial role in targeted trait improvement and accelerating intelligent breeding. Due to the species diversity of plants, existing methods heavily rely on large-scale high-precision manually annotated data. For self-occluded objects at the grain level, unsupervised methods often prove ineffective. This study proposes IPENS, an interactive unsupervised multi-target point cloud extraction method. The method utilizes radiance field information to lift 2D masks, which are segmented by SAM2 (Segment Anything Model 2), into 3D space for target point cloud extraction. A multi-target collaborative optimization strategy is designed to effectively resolve the single-interaction multi-target segmentation challenge. Experimental validation demonstrates that IPENS achieves a grain-level segmentation accuracy (mIoU) of 63.72% on a rice dataset, with strong phenotypic estimation capabilities: grain volume prediction yields R2 = 0.7697 (RMSE = 0.0025), leaf surface area R2 = 0.84 (RMSE = 18.93), and leaf length and width predictions achieve R2 = 0.97 and 0.87 (RMSE = 1.49 and 0.21). On a wheat dataset,IPENS further improves segmentation accuracy to 89.68% (mIoU), with equally outstanding phenotypic estimation performance: spike volume prediction achieves R2 = 0.9956 (RMSE = 0.0055), leaf surface area R2 = 1.00 (RMSE = 0.67), and leaf length and width predictions reach R2 = 0.99 and 0.92 (RMSE = 0.23 and 0.15). This method provides a non-invasive, high-quality phenotyping extraction solution for rice and wheat. Without requiring annotated data, it rapidly extracts grain-level point clouds within 3 minutes through simple single-round interactions on images for multiple targets, demonstrating significant potential to accelerate intelligent breeding efficiency.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
An Extensive Study on Text Serialization Formats and Methods
Authors:
Wang Wei,
Li Na,
Zhang Lei,
Liu Fang,
Chen Hao,
Yang Xiuying,
Huang Lei,
Zhao Min,
Wu Gang,
Zhou Jie,
Xu Jing,
Sun Tao,
Ma Li,
Zhu Qiang,
Hu Jun,
Guo Wei,
He Yong,
Gao Yuan,
Lin Dan,
Zheng Yi,
Shi Li
Abstract:
Text serialization is a fundamental concept in modern computing, enabling the conversion of complex data structures into a format that can be easily stored, transmitted, and reconstructed. This paper provides an extensive overview of text serialization, exploring its importance, prevalent formats, underlying methods, and comparative performance characteristics. We dive into the advantages and disa…
▽ More
Text serialization is a fundamental concept in modern computing, enabling the conversion of complex data structures into a format that can be easily stored, transmitted, and reconstructed. This paper provides an extensive overview of text serialization, exploring its importance, prevalent formats, underlying methods, and comparative performance characteristics. We dive into the advantages and disadvantages of various text-based serialization formats, including JSON, XML, YAML, and CSV, examining their structure, readability, verbosity, and suitability for different applications. The paper also discusses the common methods involved in the serialization and deserialization processes, such as parsing techniques and the role of schemas. To illustrate the practical implications of choosing a serialization format, we present hypothetical performance results in the form of tables, comparing formats based on metrics like serialization deserialization speed and resulting data size. The discussion analyzes these results, highlighting the trade offs involved in selecting a text serialization format for specific use cases. This work aims to provide a comprehensive resource for understanding and applying text serialization in various computational domains.
△ Less
Submitted 10 May, 2025;
originally announced May 2025.
-
CAFE: Retrieval Head-based Coarse-to-Fine Information Seeking to Enhance Multi-Document QA Capability
Authors:
Han Peng,
Jinhao Jiang,
Zican Dong,
Wayne Xin Zhao,
Lei Fang
Abstract:
Advancements in Large Language Models (LLMs) have extended their input context length, yet they still struggle with retrieval and reasoning in long-context inputs. Existing methods propose to utilize the prompt strategy and retrieval head to alleviate this limitation. However, they still face challenges in balancing retrieval precision and recall, impacting their efficacy in answering questions. T…
▽ More
Advancements in Large Language Models (LLMs) have extended their input context length, yet they still struggle with retrieval and reasoning in long-context inputs. Existing methods propose to utilize the prompt strategy and retrieval head to alleviate this limitation. However, they still face challenges in balancing retrieval precision and recall, impacting their efficacy in answering questions. To address this, we introduce $\textbf{CAFE}$, a two-stage coarse-to-fine method to enhance multi-document question-answering capacities. By gradually eliminating the negative impacts of background and distracting documents, CAFE makes the responses more reliant on the evidence documents. Initially, a coarse-grained filtering method leverages retrieval heads to identify and rank relevant documents. Then, a fine-grained steering method guides attention to the most relevant content. Experiments across benchmarks show CAFE outperforms baselines, achieving up to 22.1% and 13.7% SubEM improvement over SFT and RAG methods on the Mistral model, respectively.
△ Less
Submitted 15 May, 2025;
originally announced May 2025.
-
Rethinking Invariance in In-context Learning
Authors:
Lizhe Fang,
Yifei Wang,
Khashayar Gatmiry,
Lei Fang,
Yisen Wang
Abstract:
In-Context Learning (ICL) has emerged as a pivotal capability of auto-regressive large language models, yet it is hindered by a notable sensitivity to the ordering of context examples regardless of their mutual independence. To address this issue, recent studies have introduced several variant algorithms of ICL that achieve permutation invariance. However, many of these do not exhibit comparable p…
▽ More
In-Context Learning (ICL) has emerged as a pivotal capability of auto-regressive large language models, yet it is hindered by a notable sensitivity to the ordering of context examples regardless of their mutual independence. To address this issue, recent studies have introduced several variant algorithms of ICL that achieve permutation invariance. However, many of these do not exhibit comparable performance with the standard auto-regressive ICL algorithm. In this work, we identify two crucial elements in the design of an invariant ICL algorithm: information non-leakage and context interdependence, which are not simultaneously achieved by any of the existing methods. These investigations lead us to the proposed Invariant ICL (InvICL), a methodology designed to achieve invariance in ICL while ensuring the two properties. Empirically, our findings reveal that InvICL surpasses previous models, both invariant and non-invariant, in most benchmark datasets, showcasing superior generalization capabilities across varying input lengths. Code is available at https://github.com/PKU-ML/InvICL.
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
A Survey on Large Language Model based Human-Agent Systems
Authors:
Henry Peng Zou,
Wei-Chieh Huang,
Yaozu Wu,
Yankai Chen,
Chunyu Miao,
Hoang Nguyen,
Yue Zhou,
Weizhi Zhang,
Liancheng Fang,
Langzhou He,
Yangning Li,
Dongyuan Li,
Renhe Jiang,
Xue Liu,
Philip S. Yu
Abstract:
Recent advances in large language models (LLMs) have sparked growing interest in building fully autonomous agents. However, fully autonomous LLM-based agents still face significant challenges, including limited reliability due to hallucinations, difficulty in handling complex tasks, and substantial safety and ethical risks, all of which limit their feasibility and trustworthiness in real-world app…
▽ More
Recent advances in large language models (LLMs) have sparked growing interest in building fully autonomous agents. However, fully autonomous LLM-based agents still face significant challenges, including limited reliability due to hallucinations, difficulty in handling complex tasks, and substantial safety and ethical risks, all of which limit their feasibility and trustworthiness in real-world applications. To overcome these limitations, LLM-based human-agent systems (LLM-HAS) incorporate human-provided information, feedback, or control into the agent system to enhance system performance, reliability and safety. This paper provides the first comprehensive and structured survey of LLM-HAS. It clarifies fundamental concepts, systematically presents core components shaping these systems, including environment & profiling, human feedback, interaction types, orchestration and communication, explores emerging applications, and discusses unique challenges and opportunities. By consolidating current knowledge and offering a structured overview, we aim to foster further research and innovation in this rapidly evolving interdisciplinary field. Paper lists and resources are available at https://github.com/HenryPengZou/Awesome-LLM-Based-Human-Agent-Systems.
△ Less
Submitted 20 May, 2025; v1 submitted 1 May, 2025;
originally announced May 2025.
-
Keep the General, Inject the Specific: Structured Dialogue Fine-Tuning for Knowledge Injection without Catastrophic Forgetting
Authors:
Yijie Hong,
Xiaofei Yin,
Xinzhong Wang,
Yi Tu,
Ya Guo,
Sufeng Duan,
Weiqiang Wang,
Lingyong Fang,
Depeng Wang,
Huijia Zhu
Abstract:
Large Vision Language Models have demonstrated impressive versatile capabilities through extensive multimodal pre-training, but face significant limitations when incorporating specialized knowledge domains beyond their training distribution. These models struggle with a fundamental dilemma: direct adaptation approaches that inject domain-specific knowledge often trigger catastrophic forgetting of…
▽ More
Large Vision Language Models have demonstrated impressive versatile capabilities through extensive multimodal pre-training, but face significant limitations when incorporating specialized knowledge domains beyond their training distribution. These models struggle with a fundamental dilemma: direct adaptation approaches that inject domain-specific knowledge often trigger catastrophic forgetting of foundational visual-linguistic abilities. We introduce Structured Dialogue Fine-Tuning (SDFT), an effective approach that effectively injects domain-specific knowledge while minimizing catastrophic forgetting. Drawing inspiration from supervised fine-tuning in LLMs and subject-driven personalization in text-to-image diffusion models, our method employs a three-phase dialogue structure: Foundation Preservation reinforces pre-trained visual-linguistic alignment through caption tasks; Contrastive Disambiguation introduces carefully designed counterfactual examples to maintain semantic boundaries; and Knowledge Specialization embeds specialized information through chain-of-thought reasoning. Experimental results across multiple domains confirm SDFT's effectiveness in balancing specialized knowledge acquisition with general capability retention. Our key contributions include a data-centric dialogue template that balances foundational alignment with targeted knowledge integration, a weighted multi-turn supervision framework, and comprehensive evaluation across diverse knowledge types.
△ Less
Submitted 27 April, 2025;
originally announced May 2025.
-
Knowledge Distillation and Dataset Distillation of Large Language Models: Emerging Trends, Challenges, and Future Directions
Authors:
Luyang Fang,
Xiaowei Yu,
Jiazhang Cai,
Yongkai Chen,
Shushan Wu,
Zhengliang Liu,
Zhenyuan Yang,
Haoran Lu,
Xilin Gong,
Yufang Liu,
Terry Ma,
Wei Ruan,
Ali Abbasi,
Jing Zhang,
Tao Wang,
Ehsan Latif,
Wei Liu,
Wei Zhang,
Soheil Kolouri,
Xiaoming Zhai,
Dajiang Zhu,
Wenxuan Zhong,
Tianming Liu,
Ping Ma
Abstract:
The exponential growth of Large Language Models (LLMs) continues to highlight the need for efficient strategies to meet ever-expanding computational and data demands. This survey provides a comprehensive analysis of two complementary paradigms: Knowledge Distillation (KD) and Dataset Distillation (DD), both aimed at compressing LLMs while preserving their advanced reasoning capabilities and lingui…
▽ More
The exponential growth of Large Language Models (LLMs) continues to highlight the need for efficient strategies to meet ever-expanding computational and data demands. This survey provides a comprehensive analysis of two complementary paradigms: Knowledge Distillation (KD) and Dataset Distillation (DD), both aimed at compressing LLMs while preserving their advanced reasoning capabilities and linguistic diversity. We first examine key methodologies in KD, such as task-specific alignment, rationale-based training, and multi-teacher frameworks, alongside DD techniques that synthesize compact, high-impact datasets through optimization-based gradient matching, latent space regularization, and generative synthesis. Building on these foundations, we explore how integrating KD and DD can produce more effective and scalable compression strategies. Together, these approaches address persistent challenges in model scalability, architectural heterogeneity, and the preservation of emergent LLM abilities. We further highlight applications across domains such as healthcare and education, where distillation enables efficient deployment without sacrificing performance. Despite substantial progress, open challenges remain in preserving emergent reasoning and linguistic diversity, enabling efficient adaptation to continually evolving teacher models and datasets, and establishing comprehensive evaluation protocols. By synthesizing methodological innovations, theoretical foundations, and practical insights, our survey charts a path toward sustainable, resource-efficient LLMs through the tighter integration of KD and DD principles.
△ Less
Submitted 20 April, 2025;
originally announced April 2025.
-
A Multi-UAV Formation Obstacle Avoidance Method Combined Improved Simulated Annealing and Adaptive Artificial Potential Field
Authors:
Bo Ma,
Yi Ji,
Liyong Fang
Abstract:
The traditional Artificial Potential Field (APF) method exhibits limitations in its force distribution: excessive attraction when UAVs are far from the target may cause collisions with obstacles, while insufficient attraction near the goal often results in failure to reach the target. Furthermore, APF is highly susceptible to local minima, compromising motion reliability in complex environments. T…
▽ More
The traditional Artificial Potential Field (APF) method exhibits limitations in its force distribution: excessive attraction when UAVs are far from the target may cause collisions with obstacles, while insufficient attraction near the goal often results in failure to reach the target. Furthermore, APF is highly susceptible to local minima, compromising motion reliability in complex environments. To address these challenges, this paper presents a novel hybrid obstacle avoidance algorithm-Deflected Simulated Annealing-Adaptive Artificial Potential Field (DSA-AAPF)-which combines an improved simulated annealing mechanism with an enhanced APF model. The proposed approach integrates a Leader-Follower distributed formation strategy with the APF framework, where the resultant force formulation is redefined to smooth UAV trajectories. An adaptive gravitational gain function is introduced to dynamically adjust UAV velocity based on environmental context, and a fast-converging controller ensures accurate and efficient convergence to the target. Moreover, a directional deflection mechanism is embedded within the simulated annealing process, enabling UAVs to escape local minima caused by semi-enclosed obstacles through continuous rotational motion. The simulation results, covering formation reconfiguration, complex obstacle avoidance, and entrapment escape, demonstrate the feasibility, robustness, and superiority of the proposed DSA-AAPF algorithm.
△ Less
Submitted 15 April, 2025;
originally announced April 2025.
-
Enhancing Features in Long-tailed Data Using Large Vision Model
Authors:
Pengxiao Han,
Changkun Ye,
Jinguang Tong,
Cuicui Jiang,
Jie Hong,
Li Fang,
Xuesong Li
Abstract:
Language-based foundation models, such as large language models (LLMs) or large vision-language models (LVLMs), have been widely studied in long-tailed recognition. However, the need for linguistic data is not applicable to all practical tasks. In this study, we aim to explore using large vision models (LVMs) or visual foundation models (VFMs) to enhance long-tailed data features without any langu…
▽ More
Language-based foundation models, such as large language models (LLMs) or large vision-language models (LVLMs), have been widely studied in long-tailed recognition. However, the need for linguistic data is not applicable to all practical tasks. In this study, we aim to explore using large vision models (LVMs) or visual foundation models (VFMs) to enhance long-tailed data features without any language information. Specifically, we extract features from the LVM and fuse them with features in the baseline network's map and latent space to obtain the augmented features. Moreover, we design several prototype-based losses in the latent space to further exploit the potential of the augmented features. In the experimental section, we validate our approach on two benchmark datasets: ImageNet-LT and iNaturalist2018.
△ Less
Submitted 22 April, 2025; v1 submitted 15 April, 2025;
originally announced April 2025.
-
ClimateBench-M: A Multi-Modal Climate Data Benchmark with a Simple Generative Method
Authors:
Dongqi Fu,
Yada Zhu,
Zhining Liu,
Lecheng Zheng,
Xiao Lin,
Zihao Li,
Liri Fang,
Katherine Tieu,
Onkar Bhardwaj,
Kommy Weldemariam,
Hanghang Tong,
Hendrik Hamann,
Jingrui He
Abstract:
Climate science studies the structure and dynamics of Earth's climate system and seeks to understand how climate changes over time, where the data is usually stored in the format of time series, recording the climate features, geolocation, time attributes, etc. Recently, much research attention has been paid to the climate benchmarks. In addition to the most common task of weather forecasting, sev…
▽ More
Climate science studies the structure and dynamics of Earth's climate system and seeks to understand how climate changes over time, where the data is usually stored in the format of time series, recording the climate features, geolocation, time attributes, etc. Recently, much research attention has been paid to the climate benchmarks. In addition to the most common task of weather forecasting, several pioneering benchmark works are proposed for extending the modality, such as domain-specific applications like tropical cyclone intensity prediction and flash flood damage estimation, or climate statement and confidence level in the format of natural language. To further motivate the artificial general intelligence development for climate science, in this paper, we first contribute a multi-modal climate benchmark, i.e., ClimateBench-M, which aligns (1) the time series climate data from ERA5, (2) extreme weather events data from NOAA, and (3) satellite image data from NASA HLS based on a unified spatial-temporal granularity. Second, under each data modality, we also propose a simple but strong generative method that could produce competitive performance in weather forecasting, thunderstorm alerts, and crop segmentation tasks in the proposed ClimateBench-M. The data and code of ClimateBench-M are publicly available at https://github.com/iDEA-iSAIL-Lab-UIUC/ClimateBench-M.
△ Less
Submitted 9 April, 2025;
originally announced April 2025.
-
The Mini-SiTian Array: first-two-year operation
Authors:
Min He,
Hong Wu,
Liang Ge,
Jian-feng Tian,
Zheng Wang,
Hai-yang Mu,
Yu Zhang,
Yang Huang,
Jie Zheng,
Zhou Fan,
Zheng-yang Li,
Hong-hui Gu,
Heng-geng Han,
Kai Xiao,
Zhi-rui Li,
Jun-jie Jin,
Bei-chuan Wang,
Jun Ma,
Jin-hang Zou,
Ying Wu,
Jiu-peng Guo,
Li-guo Fang,
Zhi-gang Hou,
Bo-wen Zhang,
Yun-fei Xu
, et al. (48 additional authors not shown)
Abstract:
The SiTian project, designed to utilize 60 telescopes distributed across multiple sites in China, is a next-generation time-domain survey initiative. As a pathfinder for the SiTian project, the Mini-SiTian (MST) has been proposed and implemented to test the SiTian's brain and data pipeline, and to evaluate the feasibility of its technology and science cases. Mounted at the Xinglong Observatory, th…
▽ More
The SiTian project, designed to utilize 60 telescopes distributed across multiple sites in China, is a next-generation time-domain survey initiative. As a pathfinder for the SiTian project, the Mini-SiTian (MST) has been proposed and implemented to test the SiTian's brain and data pipeline, and to evaluate the feasibility of its technology and science cases. Mounted at the Xinglong Observatory, the MST project comprises three 30 cm telescopes and has been operated since Nov. 2022. Each telescope of the MST possesses a large field of view, covering $2.29^{\circ}$ $\times$ $1.53^{\circ}$ FOV, and is equipped with $g'$, $r'$ and $i'$ filters, respectively. Acting as the pioneer of the forthcoming SiTian project, the MST is dedicated to the discovery of variable stars, transients, and outburst events, and has already obtained some interesting scientific results. In this paper, we will summarize the first-two-year operation of the MST project.
△ Less
Submitted 2 April, 2025;
originally announced April 2025.
-
Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models
Authors:
Haoxiang Sun,
Yingqian Min,
Zhipeng Chen,
Wayne Xin Zhao,
Lei Fang,
Zheng Liu,
Zhongyuan Wang,
Ji-Rong Wen
Abstract:
In recent years, the rapid development of large reasoning models has resulted in the saturation of existing benchmarks for evaluating mathematical reasoning, highlighting the urgent need for more challenging and rigorous evaluation frameworks. To address this gap, we introduce OlymMATH, a novel Olympiad-level mathematical benchmark, designed to rigorously test the complex reasoning capabilities of…
▽ More
In recent years, the rapid development of large reasoning models has resulted in the saturation of existing benchmarks for evaluating mathematical reasoning, highlighting the urgent need for more challenging and rigorous evaluation frameworks. To address this gap, we introduce OlymMATH, a novel Olympiad-level mathematical benchmark, designed to rigorously test the complex reasoning capabilities of LLMs. OlymMATH features 200 meticulously curated problems, each manually verified and available in parallel English and Chinese versions. The problems are systematically organized into two distinct difficulty tiers: (1) AIME-level problems (easy) that establish a baseline for mathematical reasoning assessment, and (2) significantly more challenging problems (hard) designed to push the boundaries of current state-of-the-art models. In our benchmark, these problems span four core mathematical fields, each including a verifiable numerical solution to enable objective, rule-based evaluation. Empirical results underscore the significant challenge presented by OlymMATH, with state-of-the-art models including DeepSeek-R1, OpenAI's o3-mini and Gemini 2.5 Pro Exp demonstrating notably limited accuracy on the hard subset. Furthermore, the benchmark facilitates comprehensive bilingual assessment of mathematical reasoning abilities-a critical dimension that remains largely unaddressed in mainstream mathematical reasoning benchmarks. We release the benchmark, evaluation code, detailed results and a data visualization tool at https://github.com/RUCAIBox/OlymMATH.
△ Less
Submitted 19 May, 2025; v1 submitted 27 March, 2025;
originally announced March 2025.
-
Analysis of Learning-based Offshore Wind Power Prediction Models with Various Feature Combinations
Authors:
Linhan Fang,
Fan Jiang,
Ann Mary Toms,
Xingpeng Li
Abstract:
Accurate wind speed prediction is crucial for designing and selecting sites for offshore wind farms. This paper investigates the effectiveness of various machine learning models in predicting offshore wind power for a site near the Gulf of Mexico by analyzing meteorological data. After collecting and preprocessing meteorological data, nine different input feature combinations were designed to asse…
▽ More
Accurate wind speed prediction is crucial for designing and selecting sites for offshore wind farms. This paper investigates the effectiveness of various machine learning models in predicting offshore wind power for a site near the Gulf of Mexico by analyzing meteorological data. After collecting and preprocessing meteorological data, nine different input feature combinations were designed to assess their impact on wind power predictions at multiple heights. The results show that using wind speed as the output feature improves prediction accuracy by approximately 10% compared to using wind power as the output. In addition, the improvement of multi-feature input compared with single-feature input is not obvious mainly due to the poor correlation among key features and limited generalization ability of models. These findings underscore the importance of selecting appropriate output features and highlight considerations for using machine learning in wind power forecasting, offering insights that could guide future wind power prediction models and conversion techniques.
△ Less
Submitted 10 March, 2025;
originally announced March 2025.
-
Optimization-based method for conjugate heat transfer problems
Authors:
Liang Fang,
Xiandong Liu,
Lei Zhang
Abstract:
We propose a numerical approach for solving conjugate heat transfer problems using the finite volume method. This approach combines a semi-implicit scheme for fluid flow, governed by the incompressible Navier-Stokes equations, with an optimization-based approach for heat transfer across the fluid-solid interface. In the semi-implicit method, the convective term in the momentum equation is treated…
▽ More
We propose a numerical approach for solving conjugate heat transfer problems using the finite volume method. This approach combines a semi-implicit scheme for fluid flow, governed by the incompressible Navier-Stokes equations, with an optimization-based approach for heat transfer across the fluid-solid interface. In the semi-implicit method, the convective term in the momentum equation is treated explicitly, ensuring computational efficiency, while maintaining stability when a CFL condition involving fluid velocity is satisfied. Heat exchange between the fluid and solid domains is formulated as a constrained optimization problem, which is efficiently solved using a sequential quadratic programming method. Numerical results are presented to demonstrate the effectiveness and performance of the proposed approach.
△ Less
Submitted 16 March, 2025;
originally announced March 2025.
-
Minding Fuzzy Regions: A Data-driven Alternating Learning Paradigm for Stable Lesion Segmentation
Authors:
Lexin Fang,
Yunyang Xu,
Xiang Ma,
Xuemei Li,
Caiming Zhang
Abstract:
Deep learning has achieved significant advancements in medical image segmentation, but existing models still face challenges in accurately segmenting lesion regions. The main reason is that some lesion regions in medical images have unclear boundaries, irregular shapes, and small tissue density differences, leading to label ambiguity. However, the existing model treats all data equally without tak…
▽ More
Deep learning has achieved significant advancements in medical image segmentation, but existing models still face challenges in accurately segmenting lesion regions. The main reason is that some lesion regions in medical images have unclear boundaries, irregular shapes, and small tissue density differences, leading to label ambiguity. However, the existing model treats all data equally without taking quality differences into account in the training process, resulting in noisy labels negatively impacting model training and unstable feature representations. In this paper, a data-driven alternating learning (DALE) paradigm is proposed to optimize the model's training process, achieving stable and high-precision segmentation. The paradigm focuses on two key points: (1) reducing the impact of noisy labels, and (2) calibrating unstable representations. To mitigate the negative impact of noisy labels, a loss consistency-based collaborative optimization method is proposed, and its effectiveness is theoretically demonstrated. Specifically, the label confidence parameters are introduced to dynamically adjust the influence of labels of different confidence levels during model training, thus reducing the influence of noise labels. To calibrate the learning bias of unstable representations, a distribution alignment method is proposed. This method restores the underlying distribution of unstable representations, thereby enhancing the discriminative capability of fuzzy region representations. Extensive experiments on various benchmarks and model backbones demonstrate the superiority of the DALE paradigm, achieving an average performance improvement of up to 7.16%.
△ Less
Submitted 14 March, 2025;
originally announced March 2025.
-
Efficient Multi-Task Inferencing: Model Merging with Gromov-Wasserstein Feature Alignment
Authors:
Luyang Fang,
Ehsan Latif,
Haoran Lu,
Yifan Zhou,
Ping Ma,
Xiaoming Zhai
Abstract:
Automatic scoring of student responses enhances efficiency in education, but deploying a separate neural network for each task increases storage demands, maintenance efforts, and redundant computations. To address these challenges, this paper introduces the Gromov-Wasserstein Scoring Model Merging (GW-SMM) method, which merges models based on feature distribution similarities measured via the Grom…
▽ More
Automatic scoring of student responses enhances efficiency in education, but deploying a separate neural network for each task increases storage demands, maintenance efforts, and redundant computations. To address these challenges, this paper introduces the Gromov-Wasserstein Scoring Model Merging (GW-SMM) method, which merges models based on feature distribution similarities measured via the Gromov-Wasserstein distance. Our approach begins by extracting features from student responses using individual models, capturing both item-specific context and unique learned representations. The Gromov-Wasserstein distance then quantifies the similarity between these feature distributions, identifying the most compatible models for merging. Models exhibiting the smallest pairwise distances, typically in pairs or trios, are merged by combining only the shared layers preceding the classification head. This strategy results in a unified feature extractor while preserving separate classification heads for item-specific scoring. We validated our approach against human expert knowledge and a GPT-o1-based merging method. GW-SMM consistently outperformed both, achieving a higher micro F1 score, macro F1 score, exact match accuracy, and per-label accuracy. The improvements in micro F1 and per-label accuracy were statistically significant compared to GPT-o1-based merging (p=0.04, p=0.01). Additionally, GW-SMM reduced storage requirements by half without compromising much accuracy, demonstrating its computational efficiency alongside reliable scoring performance.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Authors:
Huatong Song,
Jinhao Jiang,
Yingqian Min,
Jie Chen,
Zhipeng Chen,
Wayne Xin Zhao,
Lei Fang,
Ji-Rong Wen
Abstract:
Existing Large Reasoning Models (LRMs) have shown the potential of reinforcement learning (RL) to enhance the complex reasoning capabilities of Large Language Models~(LLMs). While they achieve remarkable performance on challenging tasks such as mathematics and coding, they often rely on their internal knowledge to solve problems, which can be inadequate for time-sensitive or knowledge-intensive qu…
▽ More
Existing Large Reasoning Models (LRMs) have shown the potential of reinforcement learning (RL) to enhance the complex reasoning capabilities of Large Language Models~(LLMs). While they achieve remarkable performance on challenging tasks such as mathematics and coding, they often rely on their internal knowledge to solve problems, which can be inadequate for time-sensitive or knowledge-intensive questions, leading to inaccuracies and hallucinations. To address this, we propose \textbf{R1-Searcher}, a novel two-stage outcome-based RL approach designed to enhance the search capabilities of LLMs. This method allows LLMs to autonomously invoke external search systems to access additional knowledge during the reasoning process. Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start. % effectively generalizing to out-of-domain datasets and supporting both Base and Instruct models. Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.
△ Less
Submitted 18 March, 2025; v1 submitted 7 March, 2025;
originally announced March 2025.
-
An Empirical Study on Eliciting and Improving R1-like Reasoning Models
Authors:
Zhipeng Chen,
Yingqian Min,
Beichen Zhang,
Jie Chen,
Jinhao Jiang,
Daixuan Cheng,
Wayne Xin Zhao,
Zheng Liu,
Xu Miao,
Yang Lu,
Lei Fang,
Zhongyuan Wang,
Ji-Rong Wen
Abstract:
In this report, we present the third technical report on the development of slow-thinking models as part of the STILL project. As the technical pathway becomes clearer, scaling RL training has become a central technique for implementing such reasoning models. We systematically experiment with and document the effects of various factors influencing RL training, conducting experiments on both base m…
▽ More
In this report, we present the third technical report on the development of slow-thinking models as part of the STILL project. As the technical pathway becomes clearer, scaling RL training has become a central technique for implementing such reasoning models. We systematically experiment with and document the effects of various factors influencing RL training, conducting experiments on both base models and fine-tuned models. Specifically, we demonstrate that our RL training approach consistently improves the Qwen2.5-32B base models, enhancing both response length and test accuracy. Furthermore, we show that even when a model like DeepSeek-R1-Distill-Qwen-1.5B has already achieved a high performance level, it can be further refined through RL training, reaching an accuracy of 39.33% on AIME 2024. Beyond RL training, we also explore the use of tool manipulation, finding that it significantly boosts the reasoning performance of large reasoning models. This approach achieves a remarkable accuracy of 86.67% with greedy search on AIME 2024, underscoring its effectiveness in enhancing model capabilities. We release our resources at the STILL project website: https://github.com/RUCAIBox/Slow_Thinking_with_LLMs.
△ Less
Submitted 6 March, 2025;
originally announced March 2025.
-
Simultaneous direct sum decompositions of several multivariate polynomials
Authors:
Lishan Fang,
Hua-Lin Huang,
Lili Liao
Abstract:
We consider the problem of simultaneous direct sum decomposition of a set of multivariate polynomials. To this end, we extend Harrison's center theory for a single homogeneous polynomial to this broader setting. It is shown that the center of a set of polynomials is a special Jordan algebra, and simultaneous direct sum decompositions of the given polynomials are in bijection with complete sets of…
▽ More
We consider the problem of simultaneous direct sum decomposition of a set of multivariate polynomials. To this end, we extend Harrison's center theory for a single homogeneous polynomial to this broader setting. It is shown that the center of a set of polynomials is a special Jordan algebra, and simultaneous direct sum decompositions of the given polynomials are in bijection with complete sets of orthogonal idempotents of their center algebra. Several examples are provided to illustrate the performance of this method.
△ Less
Submitted 8 March, 2025; v1 submitted 3 March, 2025;
originally announced March 2025.
-
Simultaneous block diagonalization of a set of symmetric matrices via congruence
Authors:
Lishan Fang,
Hua-Lin Huang,
Jiayan Huang
Abstract:
This article studies canonical forms derived from the finest simultaneous block diagonalization of a set of symmetric matrices via congruence. Our technique relies on Harrison's center theory, which is extended from a single higher degree form to multiple quadratic forms, hence a set of symmetric matrices. The algebraic structures of centers and the bijective relationship between the simultaneous…
▽ More
This article studies canonical forms derived from the finest simultaneous block diagonalization of a set of symmetric matrices via congruence. Our technique relies on Harrison's center theory, which is extended from a single higher degree form to multiple quadratic forms, hence a set of symmetric matrices. The algebraic structures of centers and the bijective relationship between the simultaneous block diagonalization via congruence and complete sets of orthogonal idempotents of centers are investigated. We provide an algorithm that mainly uses standard linear algebra tasks and several examples to demonstrate its effectiveness. In addition, this technique can be extended verbatim to the simultaneous block diagonalization of a set of Hermitian matrices via $*$-congruence.
△ Less
Submitted 2 March, 2025;
originally announced March 2025.
-
Show Me Why It's Correct: Saving 1/3 of Debugging Time in Program Repair with Interactive Runtime Comparison
Authors:
Ruixin Wang,
Zhongkai Zhao,
Le Fang,
Nan Jiang,
Yiling Lou,
Lin Tan,
Tianyi Zhang
Abstract:
Automated Program Repair (APR) holds the promise of alleviating the burden of debugging and fixing software bugs. Despite this, developers still need to manually inspect each patch to confirm its correctness, which is tedious and time-consuming. This challenge is exacerbated in the presence of plausible patches, which accidentally pass test cases but may not correctly fix the bug. To address this…
▽ More
Automated Program Repair (APR) holds the promise of alleviating the burden of debugging and fixing software bugs. Despite this, developers still need to manually inspect each patch to confirm its correctness, which is tedious and time-consuming. This challenge is exacerbated in the presence of plausible patches, which accidentally pass test cases but may not correctly fix the bug. To address this challenge, we propose an interactive approach called iFix to facilitate patch understanding and comparison based on their runtime difference. iFix performs static analysis to identify runtime variables related to the buggy statement and captures their runtime values during execution for each patch. These values are then aligned across different patch candidates, allowing users to compare and contrast their runtime behavior. To evaluate iFix, we conducted a within-subjects user study with 28 participants. Compared with manual inspection and a state-of-the-art interactive patch filtering technique, iFix reduced participants' task completion time by 36% and 33% while also improving their confidence by 50% and 20%, respectively. Besides, quantitative experiments demonstrate that iFix improves the ranking of correct patches by at least 39% compared with other patch ranking methods and is generalizable to different APR tools.
△ Less
Submitted 1 March, 2025;
originally announced March 2025.
-
Corporate Fraud Detection in Rich-yet-Noisy Financial Graph
Authors:
Shiqi Wang,
Zhibo Zhang,
Libing Fang,
Cam-Tu Nguyen,
Wenzhong Li
Abstract:
Corporate fraud detection aims to automatically recognize companies that conduct wrongful activities such as fraudulent financial statements or illegal insider trading. Previous learning-based methods fail to effectively integrate rich interactions in the company network. To close this gap, we collect 18-year financial records in China to form three graph datasets with fraud labels. We analyze the…
▽ More
Corporate fraud detection aims to automatically recognize companies that conduct wrongful activities such as fraudulent financial statements or illegal insider trading. Previous learning-based methods fail to effectively integrate rich interactions in the company network. To close this gap, we collect 18-year financial records in China to form three graph datasets with fraud labels. We analyze the characteristics of the financial graphs, highlighting two pronounced issues: (1) information overload: the dominance of (noisy) non-company nodes over company nodes hinders the message-passing process in Graph Convolution Networks (GCN); and (2) hidden fraud: there exists a large percentage of possible undetected violations in the collected data. The hidden fraud problem will introduce noisy labels in the training dataset and compromise fraud detection results. To handle such challenges, we propose a novel graph-based method, namely, Knowledge-enhanced GCN with Robust Two-stage Learning (${\rm KeGCN}_{R}$), which leverages Knowledge Graph Embeddings to mitigate the information overload and effectively learns rich representations. The proposed model adopts a two-stage learning method to enhance robustness against hidden frauds. Extensive experimental results not only confirm the importance of interactions but also show the superiority of ${\rm KeGCN}_{R}$ over a number of strong baselines in terms of fraud detection effectiveness and robustness.
△ Less
Submitted 29 May, 2025; v1 submitted 26 February, 2025;
originally announced February 2025.
-
TestNUC: Enhancing Test-Time Computing Approaches and Scaling through Neighboring Unlabeled Data Consistency
Authors:
Henry Peng Zou,
Zhengyao Gu,
Yue Zhou,
Yankai Chen,
Weizhi Zhang,
Liancheng Fang,
Yibo Wang,
Yangning Li,
Kay Liu,
Philip S. Yu
Abstract:
Test-time computing approaches, which leverage additional computational resources during inference, have been proven effective in enhancing large language model performance. This work introduces a novel, linearly scaling approach, TestNUC, that improves test-time predictions by leveraging the local consistency of neighboring unlabeled data-it classifies an input instance by considering not only th…
▽ More
Test-time computing approaches, which leverage additional computational resources during inference, have been proven effective in enhancing large language model performance. This work introduces a novel, linearly scaling approach, TestNUC, that improves test-time predictions by leveraging the local consistency of neighboring unlabeled data-it classifies an input instance by considering not only the model's prediction on that instance but also on neighboring unlabeled instances. We evaluate TestNUC across eight diverse datasets, spanning intent classification, topic mining, domain discovery, and emotion detection, demonstrating its consistent superiority over baseline methods such as standard prompting and self-consistency. Furthermore, TestNUC can be seamlessly integrated with existing test-time computing approaches, substantially boosting their performance. Our analysis reveals that TestNUC scales effectively with increasing amounts of unlabeled data and performs robustly across different embedding models, making it practical for real-world applications. Our code is available at https://github.com/HenryPengZou/TestNUC.
△ Less
Submitted 31 May, 2025; v1 submitted 26 February, 2025;
originally announced February 2025.
-
Multi-Agent Autonomous Driving Systems with Large Language Models: A Survey of Recent Advances
Authors:
Yaozu Wu,
Dongyuan Li,
Yankai Chen,
Renhe Jiang,
Henry Peng Zou,
Liancheng Fang,
Zhen Wang,
Philip S. Yu
Abstract:
Autonomous Driving Systems (ADSs) are revolutionizing transportation by reducing human intervention, improving operational efficiency, and enhancing safety. Large Language Models (LLMs), known for their exceptional planning and reasoning capabilities, have been integrated into ADSs to assist with driving decision-making. However, LLM-based single-agent ADSs face three major challenges: limited per…
▽ More
Autonomous Driving Systems (ADSs) are revolutionizing transportation by reducing human intervention, improving operational efficiency, and enhancing safety. Large Language Models (LLMs), known for their exceptional planning and reasoning capabilities, have been integrated into ADSs to assist with driving decision-making. However, LLM-based single-agent ADSs face three major challenges: limited perception, insufficient collaboration, and high computational demands. To address these issues, recent advancements in LLM-based multi-agent ADSs have focused on improving inter-agent communication and cooperation. This paper provides a frontier survey of LLM-based multi-agent ADSs. We begin with a background introduction to related concepts, followed by a categorization of existing LLM-based approaches based on different agent interaction modes. We then discuss agent-human interactions in scenarios where LLM-based agents engage with humans. Finally, we summarize key applications, datasets, and challenges in this field to support future research (https://anonymous.4open.science/r/LLM-based_Multi-agent_ADS-3A5C/README.md).
△ Less
Submitted 23 February, 2025;
originally announced February 2025.
-
TabGen-ICL: Residual-Aware In-Context Example Selection for Tabular Data Generation
Authors:
Liancheng Fang,
Aiwei Liu,
Hengrui Zhang,
Henry Peng Zou,
Weizhi Zhang,
Philip S. Yu
Abstract:
Large Language models (LLMs) have achieved encouraging results in tabular data generation. However, existing approaches require fine-tuning, which is computationally expensive. This paper explores an alternative: prompting a fixed LLM with in-context examples. We observe that using randomly selected in-context examples hampers the LLM's performance, resulting in sub-optimal generation quality. To…
▽ More
Large Language models (LLMs) have achieved encouraging results in tabular data generation. However, existing approaches require fine-tuning, which is computationally expensive. This paper explores an alternative: prompting a fixed LLM with in-context examples. We observe that using randomly selected in-context examples hampers the LLM's performance, resulting in sub-optimal generation quality. To address this, we propose a novel in-context learning framework: TabGen-ICL, to enhance the in-context learning ability of LLMs for tabular data generation. TabGen-ICL operates iteratively, retrieving a subset of real samples that represent the residual between currently generated samples and true data distributions. This approach serves two purposes: locally, it provides more effective in-context learning examples for the LLM in each iteration; globally, it progressively narrows the gap between generated and real data. Extensive experiments on five real-world tabular datasets demonstrate that TabGen-ICL significantly outperforms the random selection strategy. Specifically, it reduces the error rate by a margin of $3.5\%-42.2\%$ on fidelity metrics. We demonstrate for the first time that prompting a fixed LLM can yield high-quality synthetic tabular data. The code is provided in the \href{https://github.com/fangliancheng/TabGEN-ICL}{link}.
△ Less
Submitted 22 February, 2025;
originally announced February 2025.
-
Accurate Forgetting for Heterogeneous Federated Continual Learning
Authors:
Abudukelimu Wuerkaixi,
Sen Cui,
Jingfeng Zhang,
Kunda Yan,
Bo Han,
Gang Niu,
Lei Fang,
Changshui Zhang,
Masashi Sugiyama
Abstract:
Recent years have witnessed a burgeoning interest in federated learning (FL). However, the contexts in which clients engage in sequential learning remain under-explored. Bridging FL and continual learning (CL) gives rise to a challenging practical problem: federated continual learning (FCL). Existing research in FCL primarily focuses on mitigating the catastrophic forgetting issue of continual lea…
▽ More
Recent years have witnessed a burgeoning interest in federated learning (FL). However, the contexts in which clients engage in sequential learning remain under-explored. Bridging FL and continual learning (CL) gives rise to a challenging practical problem: federated continual learning (FCL). Existing research in FCL primarily focuses on mitigating the catastrophic forgetting issue of continual learning while collaborating with other clients. We argue that the forgetting phenomena are not invariably detrimental. In this paper, we consider a more practical and challenging FCL setting characterized by potentially unrelated or even antagonistic data/tasks across different clients. In the FL scenario, statistical heterogeneity and data noise among clients may exhibit spurious correlations which result in biased feature learning. While existing CL strategies focus on a complete utilization of previous knowledge, we found that forgetting biased information is beneficial in our study. Therefore, we propose a new concept accurate forgetting (AF) and develop a novel generative-replay method~\method~which selectively utilizes previous knowledge in federated networks. We employ a probabilistic framework based on a normalizing flow model to quantify the credibility of previous knowledge. Comprehensive experiments affirm the superiority of our method over baselines.
△ Less
Submitted 19 February, 2025;
originally announced February 2025.
-
The contribution of dilatational motion to energy flux in homogeneous compressible turbulence
Authors:
Chensheng Luo,
Le Fang,
Jian Fang,
Haitao Xu,
Alain Pumir,
Ping-Fan Yang
Abstract:
We analyze the energy flux in compressible turbulence by generalizing the exact decomposition recently proposed by Johnson (Phys. Rev. Lett., vol. 124, 2020. 104501) to study incompressible turbulent flows. This allows us to characterize the effect of dilatational motion on the inter-scale energy transfer in three-dimensional compressible turbulence. Our analysis reveals that the contribution of d…
▽ More
We analyze the energy flux in compressible turbulence by generalizing the exact decomposition recently proposed by Johnson (Phys. Rev. Lett., vol. 124, 2020. 104501) to study incompressible turbulent flows. This allows us to characterize the effect of dilatational motion on the inter-scale energy transfer in three-dimensional compressible turbulence. Our analysis reveals that the contribution of dilatational motion to energy transfer is due to three different physical mechanisms: the interaction between dilatation and strain, between dilatation and vorticity, and the self-interaction of dilatational motion across scales. By analyzing numerical simulations of flows at moderate turbulent Mach numbers ($Ma_t \lesssim 0.3$), we validate our theoretical derivations and provide a quantitative description of the role of dilatational motion in energy transfer. In particular, we determine the scaling dependence of the dilatational contributions on the turbulent Mach number. Moreover, our findings reveal that the eddy-viscosity assumption often used in large-eddy simulations, in the spirit of the approach used for incompressible flows, effectively neglects the interaction between solenoidal-dilatational energy transfer and overestimate dilatational effects.
△ Less
Submitted 11 February, 2025;
originally announced February 2025.
-
SparseFormer: Detecting Objects in HRW Shots via Sparse Vision Transformer
Authors:
Wenxi Li,
Yuchen Guo,
Jilai Zheng,
Haozhe Lin,
Chao Ma,
Lu Fang,
Xiaokang Yang
Abstract:
Recent years have seen an increase in the use of gigapixel-level image and video capture systems and benchmarks with high-resolution wide (HRW) shots. However, unlike close-up shots in the MS COCO dataset, the higher resolution and wider field of view raise unique challenges, such as extreme sparsity and huge scale changes, causing existing close-up detectors inaccuracy and inefficiency. In this p…
▽ More
Recent years have seen an increase in the use of gigapixel-level image and video capture systems and benchmarks with high-resolution wide (HRW) shots. However, unlike close-up shots in the MS COCO dataset, the higher resolution and wider field of view raise unique challenges, such as extreme sparsity and huge scale changes, causing existing close-up detectors inaccuracy and inefficiency. In this paper, we present a novel model-agnostic sparse vision transformer, dubbed SparseFormer, to bridge the gap of object detection between close-up and HRW shots. The proposed SparseFormer selectively uses attentive tokens to scrutinize the sparsely distributed windows that may contain objects. In this way, it can jointly explore global and local attention by fusing coarse- and fine-grained features to handle huge scale changes. SparseFormer also benefits from a novel Cross-slice non-maximum suppression (C-NMS) algorithm to precisely localize objects from noisy windows and a simple yet effective multi-scale strategy to improve accuracy. Extensive experiments on two HRW benchmarks, PANDA and DOTA-v1.0, demonstrate that the proposed SparseFormer significantly improves detection accuracy (up to 5.8%) and speed (up to 3x) over the state-of-the-art approaches.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
Double-beta decay of $^{150}$Nd to excited levels of $^{150}$Sm
Authors:
A. S. Barabash,
P. Belli,
R. Bernabei,
R. S. Boiko,
F. Cappella,
V. Caracciolo,
R. Cerulli,
F. A. Danevich,
D. L. Fang,
F. Ferella,
A. Incicchitti,
V. V. Kobychev,
S. I. Konovalov,
M. Laubenstein,
A. Leoncini,
V. Merlo,
S. Nisi,
O. Nitescu,
D. V. Poda,
O. G. Polischuk,
I. B. -K. Shcherbakov,
F. Simkovic,
A. Timonina,
V. S. Tinkova,
V. I. Tretyak
, et al. (1 additional authors not shown)
Abstract:
The $2\nu2β$ decay of $^{150}$Nd to the first excited 740.5 keV $0^{+}_{1}$ level of $^{150}$Sm was measured over 5.845 yr with the help of a four-crystal low-background HPGe $γ$ spectrometry system in the underground low-background laboratory STELLA of LNGS-INFN. A 2.381 kg highly purified Nd-containing sample was employed as the decay source. The expected de-excitation gamma-quanta of the…
▽ More
The $2\nu2β$ decay of $^{150}$Nd to the first excited 740.5 keV $0^{+}_{1}$ level of $^{150}$Sm was measured over 5.845 yr with the help of a four-crystal low-background HPGe $γ$ spectrometry system in the underground low-background laboratory STELLA of LNGS-INFN. A 2.381 kg highly purified Nd-containing sample was employed as the decay source. The expected de-excitation gamma-quanta of the $0^{+}_{1}$ level with energies 334.0 keV and 406.5 keV were observed both in one-dimensional spectrum and in coincidence data resulting in the half-life $T_{1/2}=[0.83^{+0.18}_{-0.13}\mathrm{(stat)}^{+0.16}_{-0.19}\mathrm{(syst)}]\times 10^{20}$ yr. Interpreting an excess of the 334.0-keV peak area as an indication of the $2β$ decay of $^{150}$Nd to the 334.0 keV $2^+_1$ excited level of $^{150}$Sm with a half-life of $T_{1/2}=[1.5^{+2.3}_{-0.6}\mathrm{(stat)}\pm 0.4\mathrm{(syst)}]\times10^{20}$ yr, the $2\nu2β$ half-life of $^{150}$Nd for the transition to the 0$^{+}_{1}$ level is $T_{1/2}=[1.03^{+0.35}_{-0.22}\mathrm{(stat)}^{+0.16}_{-0.19}\mathrm{(syst)}]\times 10^{20}$ yr, in agreement with the previous experiments. Both half-life values reasonably agree with the theoretical calculations in the framework of proton-neutron QRPA with isospin restoration combined with like nucleon QRPA for description of excited states in the final nuclei. For $2\nu2β$ and $0\nu2β$ transitions of $^{150}$Nd and $^{148}$Nd to several excited levels of $^{150}$Sm and $^{148}$Sm, limits were set at level of $T_{1/2}>10^{20}-10^{21}$ yr.
△ Less
Submitted 2 February, 2025;
originally announced February 2025.
-
Optical centroid orbiting metrology
Authors:
Liang Fang,
Jinman Chen,
Qinjun Chen,
Chujun Zhao
Abstract:
Optical interferometry has dramatically advanced the development of modern science and technology. Here we introduce an interesting centroid evolution phenomenon of orbital angular momentum (OAM) interference fields with broken rotational symmetry, and establish a novel interferometric paradigm by fully exploiting centroid orbiting information. The centroid positions and their geometric trajectori…
▽ More
Optical interferometry has dramatically advanced the development of modern science and technology. Here we introduce an interesting centroid evolution phenomenon of orbital angular momentum (OAM) interference fields with broken rotational symmetry, and establish a novel interferometric paradigm by fully exploiting centroid orbiting information. The centroid positions and their geometric trajectories can provide more detectable information in a two-dimensional plane to sense the interferometric perturbations, compared with the conventional interferometry. We first investigate centroid orbital evolution under the inclined angle perturbation that allows for ultra-sensitive angle distinguishment with arc-second resolution. We also show centroid ellipse evolution under spatial phase perturbation that enables geometric characterization of arbitrary OAM superpositions on modal Poincaré spheres. Furthermore, based on the angle subdivision of centroid orbiting, we demonstrate the environmentally robust nanoscale displacement measurement with polarization synchronous detection, and particularly the high-resolution, fast, and large-range linear movement monitoring using commercial four-quadrant photodetectors. This novel centroid orbiting interferometry may open new opportunities to advance metrological technologies beyond the conventional interferometers.
△ Less
Submitted 12 January, 2025;
originally announced January 2025.
-
Roadmap on Neuromorphic Photonics
Authors:
Daniel Brunner,
Bhavin J. Shastri,
Mohammed A. Al Qadasi,
H. Ballani,
Sylvain Barbay,
Stefano Biasi,
Peter Bienstman,
Simon Bilodeau,
Wim Bogaerts,
Fabian Böhm,
G. Brennan,
Sonia Buckley,
Xinlun Cai,
Marcello Calvanese Strinati,
B. Canakci,
Benoit Charbonnier,
Mario Chemnitz,
Yitong Chen,
Stanley Cheung,
Jeff Chiles,
Suyeon Choi,
Demetrios N. Christodoulides,
Lukas Chrostowski,
J. Chu,
J. H. Clegg
, et al. (125 additional authors not shown)
Abstract:
This roadmap consolidates recent advances while exploring emerging applications, reflecting the remarkable diversity of hardware platforms, neuromorphic concepts, and implementation philosophies reported in the field. It emphasizes the critical role of cross-disciplinary collaboration in this rapidly evolving field.
This roadmap consolidates recent advances while exploring emerging applications, reflecting the remarkable diversity of hardware platforms, neuromorphic concepts, and implementation philosophies reported in the field. It emphasizes the critical role of cross-disciplinary collaboration in this rapidly evolving field.
△ Less
Submitted 16 January, 2025; v1 submitted 14 January, 2025;
originally announced January 2025.
-
Large Language Models for Bioinformatics
Authors:
Wei Ruan,
Yanjun Lyu,
Jing Zhang,
Jiazhang Cai,
Peng Shu,
Yang Ge,
Yao Lu,
Shang Gao,
Yue Wang,
Peilong Wang,
Lin Zhao,
Tao Wang,
Yufang Liu,
Luyang Fang,
Ziyu Liu,
Zhengliang Liu,
Yiwei Li,
Zihao Wu,
Junhao Chen,
Hanqi Jiang,
Yi Pan,
Zhenyuan Yang,
Jingyuan Chen,
Shizhe Liang,
Wei Zhang
, et al. (30 additional authors not shown)
Abstract:
With the rapid advancements in large language model (LLM) technology and the emergence of bioinformatics-specific language models (BioLMs), there is a growing need for a comprehensive analysis of the current landscape, computational characteristics, and diverse applications. This survey aims to address this need by providing a thorough review of BioLMs, focusing on their evolution, classification,…
▽ More
With the rapid advancements in large language model (LLM) technology and the emergence of bioinformatics-specific language models (BioLMs), there is a growing need for a comprehensive analysis of the current landscape, computational characteristics, and diverse applications. This survey aims to address this need by providing a thorough review of BioLMs, focusing on their evolution, classification, and distinguishing features, alongside a detailed examination of training methodologies, datasets, and evaluation frameworks. We explore the wide-ranging applications of BioLMs in critical areas such as disease diagnosis, drug discovery, and vaccine development, highlighting their impact and transformative potential in bioinformatics. We identify key challenges and limitations inherent in BioLMs, including data privacy and security concerns, interpretability issues, biases in training data and model outputs, and domain adaptation complexities. Finally, we highlight emerging trends and future directions, offering valuable insights to guide researchers and clinicians toward advancing BioLMs for increasingly sophisticated biological and clinical applications.
△ Less
Submitted 9 January, 2025;
originally announced January 2025.
-
Detect Changes like Humans: Incorporating Semantic Priors for Improved Change Detection
Authors:
Yuhang Gan,
Wenjie Xuan,
Zhiming Luo,
Lei Fang,
Zengmao Wang,
Juhua Liu,
Bo Du
Abstract:
When given two similar images, humans identify their differences by comparing the appearance ({\it e.g., color, texture}) with the help of semantics ({\it e.g., objects, relations}). However, mainstream change detection models adopt a supervised training paradigm, where the annotated binary change map is the main constraint. Thus, these methods primarily emphasize the difference-aware features bet…
▽ More
When given two similar images, humans identify their differences by comparing the appearance ({\it e.g., color, texture}) with the help of semantics ({\it e.g., objects, relations}). However, mainstream change detection models adopt a supervised training paradigm, where the annotated binary change map is the main constraint. Thus, these methods primarily emphasize the difference-aware features between bi-temporal images and neglect the semantic understanding of the changed landscapes, which undermines the accuracy in the presence of noise and illumination variations. To this end, this paper explores incorporating semantic priors to improve the ability to detect changes. Firstly, we propose a Semantic-Aware Change Detection network, namely SA-CDNet, which transfers the common knowledge of the visual foundation models ({\it i.e., FastSAM}) to change detection. Inspired by the human visual paradigm, a novel dual-stream feature decoder is derived to distinguish changes by combining semantic-aware features and difference-aware features. Secondly, we design a single-temporal semantic pre-training strategy to enhance the semantic understanding of landscapes, which brings further increments. Specifically, we construct pseudo-change detection data from public single-temporal remote sensing segmentation datasets for large-scale pre-training, where an extra branch is also introduced for the proxy semantic segmentation task. Experimental results on five challenging benchmarks demonstrate the superiority of our method over the existing state-of-the-art methods. The code is available at \href{https://github.com/thislzm/SA-CD}{SA-CD}.
△ Less
Submitted 22 December, 2024;
originally announced December 2024.
-
De-singularity Subgradient for the $q$-th-Powered $\ell_p$-Norm Weber Location Problem
Authors:
Zhao-Rong Lai,
Xiaotian Wu,
Liangda Fang,
Ziliang Chen,
Cheng Li
Abstract:
The Weber location problem is widely used in several artificial intelligence scenarios. However, the gradient of the objective does not exist at a considerable set of singular points. Recently, a de-singularity subgradient method has been proposed to fix this problem, but it can only handle the $q$-th-powered $\ell_2$-norm case ($1\leqslant q<2$), which has only finite singular points. In this pap…
▽ More
The Weber location problem is widely used in several artificial intelligence scenarios. However, the gradient of the objective does not exist at a considerable set of singular points. Recently, a de-singularity subgradient method has been proposed to fix this problem, but it can only handle the $q$-th-powered $\ell_2$-norm case ($1\leqslant q<2$), which has only finite singular points. In this paper, we further establish the de-singularity subgradient for the $q$-th-powered $\ell_p$-norm case with $1\leqslant q\leqslant p$ and $1\leqslant p<2$, which includes all the rest unsolved situations in this problem. This is a challenging task because the singular set is a continuum. The geometry of the objective function is also complicated so that the characterizations of the subgradients, minimum and descent direction are very difficult. We develop a $q$-th-powered $\ell_p$-norm Weiszfeld Algorithm without Singularity ($q$P$p$NWAWS) for this problem, which ensures convergence and the descent property of the objective function. Extensive experiments on six real-world data sets demonstrate that $q$P$p$NWAWS successfully solves the singularity problem and achieves a linear computational convergence rate in practical scenarios.
△ Less
Submitted 3 February, 2025; v1 submitted 19 December, 2024;
originally announced December 2024.
-
A Syntactic Approach to Computing Complete and Sound Abstraction in the Situation Calculus
Authors:
Liangda Fang,
Xiaoman Wang,
Zhang Chen,
Kailun Luo,
Zhenhe Cui,
Quanlong Guan
Abstract:
Abstraction is an important and useful concept in the field of artificial intelligence. To the best of our knowledge, there is no syntactic method to compute a sound and complete abstraction from a given low-level basic action theory and a refinement mapping. This paper aims to address this issue.To this end, we first present a variant of situation calculus,namely linear integer situation calculus…
▽ More
Abstraction is an important and useful concept in the field of artificial intelligence. To the best of our knowledge, there is no syntactic method to compute a sound and complete abstraction from a given low-level basic action theory and a refinement mapping. This paper aims to address this issue.To this end, we first present a variant of situation calculus,namely linear integer situation calculus, which serves as the formalization of high-level basic action theory. We then migrate Banihashemi, De Giacomo, and Lespérance's abstraction framework to one from linear integer situation calculus to extended situation calculus. Furthermore, we identify a class of Golog programs, namely guarded actions,that is used to restrict low-level Golog programs, and impose some restrictions on refinement mappings. Finally, we design a syntactic approach to computing a sound and complete abstraction from a low-level basic action theory and a restricted refinement mapping.
△ Less
Submitted 13 January, 2025; v1 submitted 15 December, 2024;
originally announced December 2024.
-
AutoDCWorkflow: LLM-based Data Cleaning Workflow Auto-Generation and Benchmark
Authors:
Lan Li,
Liri Fang,
Vetle I. Torvik
Abstract:
We investigate the reasoning capabilities of large language models (LLMs) for automatically generating data-cleaning workflows. To evaluate LLMs' ability to complete data-cleaning tasks, we implemented a pipeline for LLM-based Auto Data Cleaning Workflow (AutoDCWorkflow), prompting LLMs on data cleaning operations to repair three types of data quality issues: duplicates, missing values, and incons…
▽ More
We investigate the reasoning capabilities of large language models (LLMs) for automatically generating data-cleaning workflows. To evaluate LLMs' ability to complete data-cleaning tasks, we implemented a pipeline for LLM-based Auto Data Cleaning Workflow (AutoDCWorkflow), prompting LLMs on data cleaning operations to repair three types of data quality issues: duplicates, missing values, and inconsistent data formats. Given a dirty table and a purpose (expressed as a query), this pipeline generates a minimal, clean table sufficient to address the purpose and the data cleaning workflow used to produce the table. The planning process involves three main LLM-driven components: (1) Select Target Columns: Identifies a set of target columns related to the purpose. (2) Inspect Column Quality: Assesses the data quality for each target column and generates a Data Quality Report as operation objectives. (3) Generate Operation & Arguments: Predicts the next operation and arguments based on the data quality report results. Additionally, we propose a data cleaning benchmark to evaluate the capability of LLM agents to automatically generate workflows that address data cleaning purposes of varying difficulty levels. The benchmark comprises the annotated datasets as a collection of purpose, raw table, clean table, data cleaning workflow, and answer set. In our experiments, we evaluated three LLMs that auto-generate purpose-driven data cleaning workflows. The results indicate that LLMs perform well in planning and generating data-cleaning workflows without the need for fine-tuning.
△ Less
Submitted 12 December, 2024; v1 submitted 9 December, 2024;
originally announced December 2024.
-
Automatic State Machine Inference for Binary Protocol Reverse Engineering
Authors:
Junhai Yang,
Fenghua Li,
Yixuan Zhang,
Junhao Zhang,
Liang Fang,
Yunchuan Guo
Abstract:
Protocol Reverse Engineering (PRE) is used to analyze protocols by inferring their structure and behavior. However, current PRE methods mainly focus on field identification within a single protocol and neglect Protocol State Machine (PSM) analysis in mixed protocol environments. This results in insufficient analysis of protocols' abnormal behavior and potential vulnerabilities, which are crucial f…
▽ More
Protocol Reverse Engineering (PRE) is used to analyze protocols by inferring their structure and behavior. However, current PRE methods mainly focus on field identification within a single protocol and neglect Protocol State Machine (PSM) analysis in mixed protocol environments. This results in insufficient analysis of protocols' abnormal behavior and potential vulnerabilities, which are crucial for detecting and defending against new attack patterns. To address these challenges, we propose an automatic PSM inference framework for unknown protocols, including a fuzzy membership-based auto-converging DBSCAN algorithm for protocol format clustering, followed by a session clustering algorithm based on Needleman-Wunsch and K-Medoids algorithms to classify sessions by protocol type. Finally, we refine a probabilistic PSM algorithm to infer protocol states and the transition conditions between these states. Experimental results show that, compared with existing PRE techniques, our method can infer PSMs while enabling more precise classification of protocols.
△ Less
Submitted 3 December, 2024;
originally announced December 2024.
-
Gracefully Filtering Backdoor Samples for Generative Large Language Models without Retraining
Authors:
Zongru Wu,
Pengzhou Cheng,
Lingyong Fang,
Zhuosheng Zhang,
Gongshen Liu
Abstract:
Backdoor attacks remain significant security threats to generative large language models (LLMs). Since generative LLMs output sequences of high-dimensional token logits instead of low-dimensional classification logits, most existing backdoor defense methods designed for discriminative models like BERT are ineffective for generative LLMs. Inspired by the observed differences in learning behavior be…
▽ More
Backdoor attacks remain significant security threats to generative large language models (LLMs). Since generative LLMs output sequences of high-dimensional token logits instead of low-dimensional classification logits, most existing backdoor defense methods designed for discriminative models like BERT are ineffective for generative LLMs. Inspired by the observed differences in learning behavior between backdoor and clean mapping in the frequency space, we transform gradients of each training sample, directly influencing parameter updates, into the frequency space. Our findings reveal a distinct separation between the gradients of backdoor and clean samples in the frequency space. Based on this phenomenon, we propose Gradient Clustering in the Frequency Space for Backdoor Sample Filtering (GraCeFul), which leverages sample-wise gradients in the frequency space to effectively identify backdoor samples without requiring retraining LLMs. Experimental results show that GraCeFul outperforms baselines significantly. Notably, GraCeFul exhibits remarkable computational efficiency, achieving nearly 100% recall and F1 scores in identifying backdoor samples, reducing the average success rate of various backdoor attacks to 0% with negligible drops in clean accuracy across multiple free-style question answering datasets. Additionally, GraCeFul generalizes to Llama-2 and Vicuna. The codes are publicly available at https://github.com/ZrW00/GraceFul.
△ Less
Submitted 3 December, 2024;
originally announced December 2024.