Search | arXiv e-print repository

The Impact of Inhomogeneous Perturbations of the Inflaton on the Cosmological Primordial Magnetic Field

Authors: Yu Li, Shuang Liu, Hang Wang, Yao-Chuan Wang

Abstract: We investigate the impact of inhomogeneous inflaton perturbations on primordial magnetic fields within the framework of generalized inflationary magnetogenesis models. Extending the Ratra model to general spacetime backgrounds, we analyze the constraint structure of the electromagnetic field and demonstrate that the standard Coulomb gauge must be generalized to accommodate spatial inhomogeneities.… ▽ More We investigate the impact of inhomogeneous inflaton perturbations on primordial magnetic fields within the framework of generalized inflationary magnetogenesis models. Extending the Ratra model to general spacetime backgrounds, we analyze the constraint structure of the electromagnetic field and demonstrate that the standard Coulomb gauge must be generalized to accommodate spatial inhomogeneities. Instead of the vector potential, we solve the conjugate momentum with the modified initial conditions introduced by the coupling function, which become dominant during the late stages of inflation. These change the conditions under which scale-invariant electromagnetic spectra are achieved. Furthermore, we address the challenge of evaluating convolutions between vector potentials and inflaton perturbations by employing separate large- and small-scale approximations. The resulting influence to the electric and magnetic power spectra are quantified using $Δ_E$ and $Δ_B$, revealing a scale-dependent influence of inhomogeneities. We also find that the spectrum index evolution is sensitive to the sign of $V_φ$, with distinctive behaviors for electric and magnetic fields under different scale-invariance conditions. Notably, for nearly scale-invariant magnetic fields, the perturbative effects shift the spectral index towards the red and migrate toward smaller scales as inflation progresses, offering a potential observational probe to differentiate between large-field and small-field inflation scenarios. △ Less

Submitted 24 April, 2025; originally announced April 2025.

Comments: 13 pages, 1 figure

arXiv:2504.17384 [pdf, other]

On the workflow, opportunities and challenges of developing foundation model in geophysics

Authors: Hanlin Sheng, Xinming Wu, Hang Gao, Haibin Di, Sergey Fomel, Jintao Li, Xu Si

Abstract: Foundation models, as a mainstream technology in artificial intelligence, have demonstrated immense potential across various domains in recent years, particularly in handling complex tasks and multimodal data. In the field of geophysics, although the application of foundation models is gradually expanding, there is currently a lack of comprehensive reviews discussing the full workflow of integrati… ▽ More Foundation models, as a mainstream technology in artificial intelligence, have demonstrated immense potential across various domains in recent years, particularly in handling complex tasks and multimodal data. In the field of geophysics, although the application of foundation models is gradually expanding, there is currently a lack of comprehensive reviews discussing the full workflow of integrating foundation models with geophysical data. To address this gap, this paper presents a complete framework that systematically explores the entire process of developing foundation models in conjunction with geophysical data. From data collection and preprocessing to model architecture selection, pre-training strategies, and model deployment, we provide a detailed analysis of the key techniques and methodologies at each stage. In particular, considering the diversity, complexity, and physical consistency constraints of geophysical data, we discuss targeted solutions to address these challenges. Furthermore, we discuss how to leverage the transfer learning capabilities of foundation models to reduce reliance on labeled data, enhance computational efficiency, and incorporate physical constraints into model training, thereby improving physical consistency and interpretability. Through a comprehensive summary and analysis of the current technological landscape, this paper not only fills the gap in the geophysics domain regarding a full-process review of foundation models but also offers valuable practical guidance for their application in geophysical data analysis, driving innovation and advancement in the field. △ Less

Submitted 25 April, 2025; v1 submitted 24 April, 2025; originally announced April 2025.

arXiv:2504.16693 [pdf, ps, other]

PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation

Authors: Wenxuan Li, Hang Zhao, Zhiyuan Yu, Yu Du, Qin Zou, Ruizhen Hu, Kai Xu

Abstract: While non-prehensile manipulation (e.g., controlled pushing/poking) constitutes a foundational robotic skill, its learning remains challenging due to the high sensitivity to complex physical interactions involving friction and restitution. To achieve robust policy learning and generalization, we opt to learn a world model of the 3D rigid body dynamics involved in non-prehensile manipulations and u… ▽ More While non-prehensile manipulation (e.g., controlled pushing/poking) constitutes a foundational robotic skill, its learning remains challenging due to the high sensitivity to complex physical interactions involving friction and restitution. To achieve robust policy learning and generalization, we opt to learn a world model of the 3D rigid body dynamics involved in non-prehensile manipulations and use it for model-based reinforcement learning. We propose PIN-WM, a Physics-INformed World Model that enables efficient end-to-end identification of a 3D rigid body dynamical system from visual observations. Adopting differentiable physics simulation, PIN-WM can be learned with only few-shot and task-agnostic physical interaction trajectories. Further, PIN-WM is learned with observational loss induced by Gaussian Splatting without needing state estimation. To bridge Sim2Real gaps, we turn the learned PIN-WM into a group of Digital Cousins via physics-aware randomizations which perturb physics and rendering parameters to generate diverse and meaningful variations of the PIN-WM. Extensive evaluations on both simulation and real-world tests demonstrate that PIN-WM, enhanced with physics-aware digital cousins, facilitates learning robust non-prehensile manipulation skills with Sim2Real transfer, surpassing the Real2Sim2Real state-of-the-arts. △ Less

Submitted 3 May, 2025; v1 submitted 23 April, 2025; originally announced April 2025.

Comments: Robotics: Science and Systems 2025

arXiv:2504.16385 [pdf, other]

Distributed Space Resource Logistics Architecture Optimization under Economies of Scale

Authors: Evangelia Gkaravela, Hang Woon Lee, Hao Chen

Abstract: This paper proposes an optimization framework for distributed resource logistics system design to support future multimission space exploration. The performance and impact of distributed In-Situ Resource Utilization (ISRU) systems in facilitating space transportation are analyzed. The proposed framework considers technology trade studies, deployment strategy, facility location evaluation, and reso… ▽ More This paper proposes an optimization framework for distributed resource logistics system design to support future multimission space exploration. The performance and impact of distributed In-Situ Resource Utilization (ISRU) systems in facilitating space transportation are analyzed. The proposed framework considers technology trade studies, deployment strategy, facility location evaluation, and resource logistics after production for distributed ISRU systems. We develop piecewise linear sizing and cost estimation models based on economies of scale that can be easily integrated into network-based mission planning formulations. A case study on a multi-mission cislunar logistics campaign is conducted to demonstrate the value of the proposed method and evaluate key tradeoffs to compare the performance of distributed ISRU systems with traditional concentrated ISRU. Finally, a comprehensive sensitivity analysis is performed to assess the proposed system under varying conditions, comparing concentrated and distributed ISRU systems. △ Less

Submitted 22 April, 2025; originally announced April 2025.

Comments: 27 pages, 13 figures, Journal of Spacecraft and Rockets (Accepted)

arXiv:2504.16008 [pdf, other]

An Error Mitigated Non-Orthogonal Quantum Eigensolver via Shadow Tomography

Authors: Hang Ren, Yipei Zhang, Wendy M. Billings, Rebecca Tomann, Nikolay V. Tkachenko, Martin Head-Gordon, K. Birgitta Whaley

Abstract: We present a shadow-tomography-enhanced Non-Orthogonal Quantum Eigensolver (NOQE) for more efficient and accurate electronic structure calculations on near-term quantum devices. By integrating shadow tomography into the NOQE, the measurement cost scales linearly rather than quadratically with the number of reference states, while also reducing the required qubits and circuit depth by half. This ap… ▽ More We present a shadow-tomography-enhanced Non-Orthogonal Quantum Eigensolver (NOQE) for more efficient and accurate electronic structure calculations on near-term quantum devices. By integrating shadow tomography into the NOQE, the measurement cost scales linearly rather than quadratically with the number of reference states, while also reducing the required qubits and circuit depth by half. This approach enables extraction of all matrix elements via randomized measurements and classical postprocessing. We analyze its sample complexity and show that, for small systems, it remains constant in the high-precision regime, while for larger systems, it scales linearly with the system size. We further apply shadow-based error mitigation to suppress noise-induced bias without increasing quantum resources. Demonstrations on the hydrogen molecule in the strongly correlated regime achieve chemical accuracy under realistic noise, showing that our method is both resource-efficient and noise-resilient for practical quantum chemistry simulations in the near term. △ Less

Submitted 23 April, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

arXiv:2504.15300 [pdf, other]

Collaborative Learning of On-Device Small Model and Cloud-Based Large Model: Advances and Future Directions

Authors: Chaoyue Niu, Yucheng Ding, Junhui Lu, Zhengxiang Huang, Hang Zeng, Yutong Dai, Xuezhen Tu, Chengfei Lv, Fan Wu, Guihai Chen

Abstract: The conventional cloud-based large model learning framework is increasingly constrained by latency, cost, personalization, and privacy concerns. In this survey, we explore an emerging paradigm: collaborative learning between on-device small model and cloud-based large model, which promises low-latency, cost-efficient, and personalized intelligent services while preserving user privacy. We provide… ▽ More The conventional cloud-based large model learning framework is increasingly constrained by latency, cost, personalization, and privacy concerns. In this survey, we explore an emerging paradigm: collaborative learning between on-device small model and cloud-based large model, which promises low-latency, cost-efficient, and personalized intelligent services while preserving user privacy. We provide a comprehensive review across hardware, system, algorithm, and application layers. At each layer, we summarize key problems and recent advances from both academia and industry. In particular, we categorize collaboration algorithms into data-based, feature-based, and parameter-based frameworks. We also review publicly available datasets and evaluation metrics with user-level or device-level consideration tailored to collaborative learning settings. We further highlight real-world deployments, ranging from recommender systems and mobile livestreaming to personal intelligent assistants. We finally point out open research directions to guide future development in this rapidly evolving field. △ Less

Submitted 17 April, 2025; originally announced April 2025.

arXiv:2504.14594 [pdf, other]

HealthGenie: Empowering Users with Healthy Dietary Guidance through Knowledge Graph and Large Language Models

Authors: Fan Gao, Xinjie Zhao, Ding Xia, Zhongyi Zhou, Rui Yang, Jinghui Lu, Hang Jiang, Chanjun Park, Irene Li

Abstract: Seeking dietary guidance often requires navigating complex professional knowledge while accommodating individual health conditions. Knowledge Graphs (KGs) offer structured and interpretable nutritional information, whereas Large Language Models (LLMs) naturally facilitate conversational recommendation delivery. In this paper, we present HealthGenie, an interactive system that combines the strength… ▽ More Seeking dietary guidance often requires navigating complex professional knowledge while accommodating individual health conditions. Knowledge Graphs (KGs) offer structured and interpretable nutritional information, whereas Large Language Models (LLMs) naturally facilitate conversational recommendation delivery. In this paper, we present HealthGenie, an interactive system that combines the strengths of LLMs and KGs to provide personalized dietary recommendations along with hierarchical information visualization for a quick and intuitive overview. Upon receiving a user query, HealthGenie performs query refinement and retrieves relevant information from a pre-built KG. The system then visualizes and highlights pertinent information, organized by defined categories, while offering detailed, explainable recommendation rationales. Users can further tailor these recommendations by adjusting preferences interactively. Our evaluation, comprising a within-subject comparative experiment and an open-ended discussion, demonstrates that HealthGenie effectively supports users in obtaining personalized dietary guidance based on their health conditions while reducing interaction effort and cognitive load. These findings highlight the potential of LLM-KG integration in supporting decision-making through explainable and visualized information. We examine the system's usefulness and effectiveness with an N=12 within-subject study and provide design considerations for future systems that integrate conversational LLM and KG. △ Less

Submitted 20 April, 2025; originally announced April 2025.

arXiv:2504.13914 [pdf, other]

Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning

Authors: ByteDance Seed, :, Jiaze Chen, Tiantian Fan, Xin Liu, Lingjun Liu, Zhiqi Lin, Mingxuan Wang, Chengyi Wang, Xiangpeng Wei, Wenyuan Xu, Yufeng Yuan, Yu Yue, Lin Yan, Qiying Yu, Xiaochen Zuo, Chi Zhang, Ruofei Zhu, Zhecheng An, Zhihao Bai, Yu Bao, Xingyan Bin, Jiangjie Chen, Feng Chen, Hongmin Chen , et al. (249 additional authors not shown)

Abstract: We introduce Seed1.5-Thinking, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks. Seed1.5-Thinking achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains. For in… ▽ More We introduce Seed1.5-Thinking, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks. Seed1.5-Thinking achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains. For instance, it surpasses DeepSeek R1 by 8% in win rate on non-reasoning tasks, indicating its broader applicability. Compared to other state-of-the-art reasoning models, Seed1.5-Thinking is a Mixture-of-Experts (MoE) model with a relatively small size, featuring 20B activated and 200B total parameters. As part of our effort to assess generalized reasoning, we develop two internal benchmarks, BeyondAIME and Codeforces, both of which will be publicly released to support future research. Model trial link: https://www.volcengine.com/experience/ark. △ Less

Submitted 29 April, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

arXiv:2504.13865 [pdf, ps, other]

A Survey on (M)LLM-Based GUI Agents

Authors: Fei Tang, Haolei Xu, Hang Zhang, Siqi Chen, Xingyu Wu, Yongliang Shen, Wenqi Zhang, Guiyang Hou, Zeqi Tan, Yuchen Yan, Kaitao Song, Jian Shao, Weiming Lu, Jun Xiao, Yueting Zhuang

Abstract: Graphical User Interface (GUI) Agents have emerged as a transformative paradigm in human-computer interaction, evolving from rule-based automation scripts to sophisticated AI-driven systems capable of understanding and executing complex interface operations. This survey provides a comprehensive examination of the rapidly advancing field of LLM-based GUI Agents, systematically analyzing their archi… ▽ More Graphical User Interface (GUI) Agents have emerged as a transformative paradigm in human-computer interaction, evolving from rule-based automation scripts to sophisticated AI-driven systems capable of understanding and executing complex interface operations. This survey provides a comprehensive examination of the rapidly advancing field of LLM-based GUI Agents, systematically analyzing their architectural foundations, technical components, and evaluation methodologies. We identify and analyze four fundamental components that constitute modern GUI Agents: (1) perception systems that integrate text-based parsing with multimodal understanding for comprehensive interface comprehension; (2) exploration mechanisms that construct and maintain knowledge bases through internal modeling, historical experience, and external information retrieval; (3) planning frameworks that leverage advanced reasoning methodologies for task decomposition and execution; and (4) interaction systems that manage action generation with robust safety controls. Through rigorous analysis of these components, we reveal how recent advances in large language models and multimodal learning have revolutionized GUI automation across desktop, mobile, and web platforms. We critically examine current evaluation frameworks, highlighting methodological limitations in existing benchmarks while proposing directions for standardization. This survey also identifies key technical challenges, including accurate element localization, effective knowledge retrieval, long-horizon planning, and safety-aware execution control, while outlining promising research directions for enhancing GUI Agents' capabilities. Our systematic review provides researchers and practitioners with a thorough understanding of the field's current state and offers insights into future developments in intelligent interface automation. △ Less

Submitted 4 June, 2025; v1 submitted 27 March, 2025; originally announced April 2025.

arXiv:2504.12892 [pdf, other]

Manifold-valued function approximation from multiple tangent spaces

Authors: Hang Wang, Raf Vandebril, Joeri Van der Veken, Nick Vannieuwenhoven

Abstract: Approximating a manifold-valued function from samples of input-output pairs consists of modeling the relationship between an input from a vector space and an output on a Riemannian manifold. We propose a function approximation method that leverages and unifies two prior techniques: (i) approximating a pullback to the tangent space, and (ii) the Riemannian moving least squares method. The core idea… ▽ More Approximating a manifold-valued function from samples of input-output pairs consists of modeling the relationship between an input from a vector space and an output on a Riemannian manifold. We propose a function approximation method that leverages and unifies two prior techniques: (i) approximating a pullback to the tangent space, and (ii) the Riemannian moving least squares method. The core idea of the new scheme is to combine pullbacks to multiple tangent spaces with a weighted Fréchet mean. The effectiveness of this approach is illustrated with numerical experiments on model problems from parametric model order reduction. △ Less

Submitted 17 April, 2025; originally announced April 2025.

Comments: 25 pages, 7 figures

MSC Class: 65D15; 65D40; 65J99; 46T20; 53B20; 58C25

arXiv:2504.12711 [pdf, other]

NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results

Authors: Xin Li, Yeying Jin, Xin Jin, Zongwei Wu, Bingchen Li, Yufei Wang, Wenhan Yang, Yu Li, Zhibo Chen, Bihan Wen, Robby T. Tan, Radu Timofte, Qiyu Rong, Hongyuan Jing, Mengmeng Zhang, Jinglong Li, Xiangyu Lu, Yi Ren, Yuting Liu, Meng Zhang, Xiang Chen, Qiyuan Guan, Jiangxin Dong, Jinshan Pan, Conglin Gou , et al. (112 additional authors not shown)

Abstract: This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images. This challenge received a wide range of impressive solutions, which are developed and evaluated using our collected real-world Raindrop Clarity dataset. Unlike existing deraining datasets, our Raindrop Clarity dataset is more diverse and challenging in degradation types and contents, which includ… ▽ More This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images. This challenge received a wide range of impressive solutions, which are developed and evaluated using our collected real-world Raindrop Clarity dataset. Unlike existing deraining datasets, our Raindrop Clarity dataset is more diverse and challenging in degradation types and contents, which includes day raindrop-focused, day background-focused, night raindrop-focused, and night background-focused degradations. This dataset is divided into three subsets for competition: 14,139 images for training, 240 images for validation, and 731 images for testing. The primary objective of this challenge is to establish a new and powerful benchmark for the task of removing raindrops under varying lighting and focus conditions. There are a total of 361 participants in the competition, and 32 teams submitting valid solutions and fact sheets for the final testing phase. These submissions achieved state-of-the-art (SOTA) performance on the Raindrop Clarity dataset. The project can be found at https://lixinustc.github.io/CVPR-NTIRE2025-RainDrop-Competition.github.io/. △ Less

Submitted 19 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

Comments: Challenge Report of CVPR NTIRE 2025; 26 pages; Methods from 32 teams

arXiv:2504.12643 [pdf, ps, other]

RoPETR: Improving Temporal Camera-Only 3D Detection by Integrating Enhanced Rotary Position Embedding

Authors: Hang Ji, Tao Ni, Xufeng Huang, Zhan Shi, Tao Luo, Xin Zhan, Junbo Chen

Abstract: This technical report introduces a targeted improvement to the StreamPETR framework, specifically aimed at enhancing velocity estimation, a critical factor influencing the overall NuScenes Detection Score. While StreamPETR exhibits strong 3D bounding box detection performance as reflected by its high mean Average Precision our analysis identified velocity estimation as a substantial bottleneck whe… ▽ More This technical report introduces a targeted improvement to the StreamPETR framework, specifically aimed at enhancing velocity estimation, a critical factor influencing the overall NuScenes Detection Score. While StreamPETR exhibits strong 3D bounding box detection performance as reflected by its high mean Average Precision our analysis identified velocity estimation as a substantial bottleneck when evaluated on the NuScenes dataset. To overcome this limitation, we propose a customized positional embedding strategy tailored to enhance temporal modeling capabilities. Experimental evaluations conducted on the NuScenes test set demonstrate that our improved approach achieves a state-of-the-art NDS of 70.86% using the ViT-L backbone, setting a new benchmark for camera-only 3D object detection. △ Less

Submitted 6 June, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

arXiv:2504.12276 [pdf, other]

The Tenth NTIRE 2025 Image Denoising Challenge Report

Authors: Lei Sun, Hang Guo, Bin Ren, Luc Van Gool, Radu Timofte, Yawei Li, Xiangyu Kong, Hyunhee Park, Xiaoxuan Yu, Suejin Han, Hakjae Jeon, Jia Li, Hyung-Ju Chun, Donghun Ryou, Inju Ha, Bohyung Han, Jingyu Ma, Zhijuan Huang, Huiyuan Fu, Hongyuan Yu, Boqi Zhang, Jiawei Shi, Heng Zhang, Huadong Ma, Deepak Kumar Tyagi , et al. (69 additional authors not shown)

Abstract: This paper presents an overview of the NTIRE 2025 Image Denoising Challenge (σ = 50), highlighting the proposed methodologies and corresponding results. The primary objective is to develop a network architecture capable of achieving high-quality denoising performance, quantitatively evaluated using PSNR, without constraints on computational complexity or model size. The task assumes independent ad… ▽ More This paper presents an overview of the NTIRE 2025 Image Denoising Challenge (σ = 50), highlighting the proposed methodologies and corresponding results. The primary objective is to develop a network architecture capable of achieving high-quality denoising performance, quantitatively evaluated using PSNR, without constraints on computational complexity or model size. The task assumes independent additive white Gaussian noise (AWGN) with a fixed noise level of 50. A total of 290 participants registered for the challenge, with 20 teams successfully submitting valid results, providing insights into the current state-of-the-art in image denoising. △ Less

Submitted 16 April, 2025; originally announced April 2025.

arXiv:2504.12234 [pdf, other]

MOS: Towards Effective Smart Contract Vulnerability Detection through Mixture-of-Experts Tuning of Large Language Models

Authors: Hang Yuan, Lei Yu, Zhirong Huang, Jingyuan Zhang, Junyi Lu, Shiqi Cheng, Li Yang, Fengjun Zhang, Jiajia Ma, Chun Zuo

Abstract: Smart contract vulnerabilities pose significant security risks to blockchain systems, potentially leading to severe financial losses. Existing methods face several limitations: (1) Program analysis-based approaches rely on predefined patterns, lacking flexibility for new vulnerability types; (2) Deep learning-based methods lack explanations; (3) Large language model-based approaches suffer from hi… ▽ More Smart contract vulnerabilities pose significant security risks to blockchain systems, potentially leading to severe financial losses. Existing methods face several limitations: (1) Program analysis-based approaches rely on predefined patterns, lacking flexibility for new vulnerability types; (2) Deep learning-based methods lack explanations; (3) Large language model-based approaches suffer from high false positives. We propose MOS, a smart contract vulnerability detection framework based on mixture-of-experts tuning (MOE-Tuning) of large language models. First, we conduct continual pre-training on a large-scale smart contract dataset to provide domain-enhanced initialization. Second, we construct a high-quality MOE-Tuning dataset through a multi-stage pipeline combining LLM generation and expert verification for reliable explanations. Third, we design a vulnerability-aware routing mechanism that activates the most relevant expert networks by analyzing code features and their matching degree with experts. Finally, we extend the feed-forward layers into multiple parallel expert networks, each specializing in specific vulnerability patterns. We employ a dual-objective loss function: one for optimizing detection and explanation performance, and another for ensuring reasonable distribution of vulnerability types to experts through entropy calculation. Experiments show that MOS significantly outperforms existing methods with average improvements of 6.32% in F1 score and 4.80% in accuracy. The vulnerability explanations achieve positive ratings (scores of 3-4 on a 4-point scale) of 82.96%, 85.21% and 94.58% for correctness, completeness, and conciseness through human and LLM evaluation. △ Less

Submitted 16 April, 2025; originally announced April 2025.

arXiv:2504.11823 [pdf, other]

doi 10.1109/TVT.2025.3560658

Multi-goal Rapidly Exploring Random Tree with Safety and Dynamic Constraints for UAV Cooperative Path Planning

Authors: Thu Hang Khuat, Duy-Nam Bui, Hoa TT. Nguyen, Mien L. Trinh, Minh T. Nguyen, Manh Duong Phung

Abstract: Cooperative path planning is gaining its importance due to the increasing demand on using multiple unmanned aerial vehicles (UAVs) for complex missions. This work addresses the problem by introducing a new algorithm named MultiRRT that extends the rapidly exploring random tree (RRT) to generate paths for a group of UAVs to reach multiple goal locations at the same time. We first derive the dynamic… ▽ More Cooperative path planning is gaining its importance due to the increasing demand on using multiple unmanned aerial vehicles (UAVs) for complex missions. This work addresses the problem by introducing a new algorithm named MultiRRT that extends the rapidly exploring random tree (RRT) to generate paths for a group of UAVs to reach multiple goal locations at the same time. We first derive the dynamics constraint of the UAV and include it in the problem formulation. MultiRRT is then developed, taking into account the cooperative requirements and safe constraints during its path-searching process. The algorithm features two new mechanisms, node reduction and Bezier interpolation, to ensure the feasibility and optimality of the paths generated. Importantly, the interpolated paths are proven to meet the safety and dynamics constraints imposed by obstacles and the UAVs. A number of simulations, comparisons, and experiments have been conducted to evaluate the performance of the proposed approach. The results show that MultiRRT can generate collision-free paths for multiple UAVs to reach their goals with better scores in path length and smoothness metrics than state-of-the-art RRT variants including Theta-RRT, FN-RRT, RRT*, and RRT*-Smart. The generated paths are also tested in practical flights with real UAVs to evaluate their validity for cooperative tasks. The source code of the algorithm is available at https://github.com/duynamrcv/multi-target_RRT △ Less

Submitted 16 April, 2025; originally announced April 2025.

Journal ref: IEEE Transactions on Vehicular Technology, 2025

arXiv:2504.11711 [pdf, ps, other]

The Hitchhiker's Guide to Program Analysis, Part II: Deep Thoughts by LLMs

Authors: Haonan Li, Hang Zhang, Kexin Pei, Zhiyun Qian

Abstract: Static analysis plays a crucial role in software vulnerability detection, yet faces a persistent precision-scalability tradeoff. In large codebases like the Linux kernel, traditional static analysis tools often generate excessive false positives due to simplified vulnerability modeling and overapproximation of path and data constraints. While large language models (LLMs) demonstrate promising code… ▽ More Static analysis plays a crucial role in software vulnerability detection, yet faces a persistent precision-scalability tradeoff. In large codebases like the Linux kernel, traditional static analysis tools often generate excessive false positives due to simplified vulnerability modeling and overapproximation of path and data constraints. While large language models (LLMs) demonstrate promising code understanding capabilities, their direct application to program analysis remains unreliable due to inherent reasoning limitations. We introduce BugLens, a post-refinement framework that significantly enhances static analysis precision for bug detection. BugLens guides LLMs through structured reasoning steps to assess security impact and validate constraints from the source code. When evaluated on Linux kernel taint-style bugs detected by static analysis tools, BugLens improves precision approximately 7-fold (from 0.10 to 0.72), substantially reducing false positives while uncovering four previously unreported vulnerabilities. Our results demonstrate that a well-structured, fully automated LLM-based workflow can effectively complement and enhance traditional static analysis techniques. △ Less

Submitted 31 May, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

arXiv:2504.10686 [pdf, other]

The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report

Authors: Bin Ren, Hang Guo, Lei Sun, Zongwei Wu, Radu Timofte, Yawei Li, Yao Zhang, Xinning Chai, Zhengxue Cheng, Yingsheng Qin, Yucai Yang, Li Song, Hongyuan Yu, Pufan Xu, Cheng Wan, Zhijuan Huang, Peng Guo, Shuyuan Cui, Chenjun Li, Xuehai Hu, Pan Pan, Xin Zhang, Heng Zhang, Qing Luo, Linyan Jiang , et al. (122 additional authors not shown)

Abstract: This paper presents a comprehensive review of the NTIRE 2025 Challenge on Single-Image Efficient Super-Resolution (ESR). The challenge aimed to advance the development of deep models that optimize key computational metrics, i.e., runtime, parameters, and FLOPs, while achieving a PSNR of at least 26.90 dB on the $\operatorname{DIV2K\_LSDIR\_valid}$ dataset and 26.99 dB on the… ▽ More This paper presents a comprehensive review of the NTIRE 2025 Challenge on Single-Image Efficient Super-Resolution (ESR). The challenge aimed to advance the development of deep models that optimize key computational metrics, i.e., runtime, parameters, and FLOPs, while achieving a PSNR of at least 26.90 dB on the $\operatorname{DIV2K\_LSDIR\_valid}$ dataset and 26.99 dB on the $\operatorname{DIV2K\_LSDIR\_test}$ dataset. A robust participation saw \textbf{244} registered entrants, with \textbf{43} teams submitting valid entries. This report meticulously analyzes these methods and results, emphasizing groundbreaking advancements in state-of-the-art single-image ESR techniques. The analysis highlights innovative approaches and establishes benchmarks for future research in the field. △ Less

Submitted 14 April, 2025; originally announced April 2025.

Comments: Accepted by CVPR2025 NTIRE Workshop, Efficient Super-Resolution Challenge Report. 50 pages

arXiv:2504.10416 [pdf, other]

Region Based SLAM-Aware Exploration: Efficient and Robust Autonomous Mapping Strategy That Can Scale

Authors: Megha Maheshwari, Sadeigh Rabiee, He Yin, Martin Labrie, Hang Liu

Abstract: Autonomous exploration for mapping unknown large scale environments is a fundamental challenge in robotics, with efficiency in time, stability against map corruption and computational resources being crucial. This paper presents a novel approach to indoor exploration that addresses these key issues in existing methods. We introduce a Simultaneous Localization and Mapping (SLAM)-aware region-based… ▽ More Autonomous exploration for mapping unknown large scale environments is a fundamental challenge in robotics, with efficiency in time, stability against map corruption and computational resources being crucial. This paper presents a novel approach to indoor exploration that addresses these key issues in existing methods. We introduce a Simultaneous Localization and Mapping (SLAM)-aware region-based exploration strategy that partitions the environment into discrete regions, allowing the robot to incrementally explore and stabilize each region before moving to the next one. This approach significantly reduces redundant exploration and improves overall efficiency. As the device finishes exploring a region and stabilizes it, we also perform SLAM keyframe marginalization, a technique which reduces problem complexity by eliminating variables, while preserving their essential information. To improves robustness and further enhance efficiency, we develop a check- point system that enables the robot to resume exploration from the last stable region in case of failures, eliminating the need for complete re-exploration. Our method, tested in real homes, office and simulations, outperforms state-of-the-art approaches. The improvements demonstrate substantial enhancements in various real world environments, with significant reductions in keyframe usage (85%), submap usage (50% office, 32% home), pose graph optimization time (78-80%), and exploration duration (10-15%). This region-based strategy with keyframe marginalization offers an efficient solution for autonomous robotic mapping. △ Less

Submitted 14 April, 2025; originally announced April 2025.

Comments: 8 pages, 9 figures

arXiv:2504.10233 [pdf, other]

doi 10.1145/3689031.3717456

Bingo: Radix-based Bias Factorization for Random Walk on Dynamic Graphs

Authors: Pinhuan Wang, Chengying Huan, Zhibin Wang, Chen Tian, Yuede Ji, Hang Liu

Abstract: Random walks are a primary means for extracting information from large-scale graphs. While most real-world graphs are inherently dynamic, state-of-the-art random walk engines failed to efficiently support such a critical use case. This paper takes the initiative to build a general random walk engine for dynamically changing graphs with two key principles: (i) This system should support both low-la… ▽ More Random walks are a primary means for extracting information from large-scale graphs. While most real-world graphs are inherently dynamic, state-of-the-art random walk engines failed to efficiently support such a critical use case. This paper takes the initiative to build a general random walk engine for dynamically changing graphs with two key principles: (i) This system should support both low-latency streaming updates and high-throughput batched updates. (ii) This system should achieve fast sampling speed while maintaining acceptable space consumption to support dynamic graph updates. Upholding both standards, we introduce Bingo, a GPU-based random walk engine for dynamically changing graphs. First, we propose a novel radix-based bias factorization algorithm to support constant time sampling complexity while supporting fast streaming updates. Second, we present a group-adaption design to reduce space consumption dramatically. Third, we incorporate GPU-aware designs to support high-throughput batched graph updates on massively parallel platforms. Together, Bingo outperforms existing efforts across various applications, settings, and datasets, achieving up to a 271.11x speedup compared to the state-of-the-art efforts. △ Less

Submitted 14 April, 2025; originally announced April 2025.

Comments: 17 pages, Published in EuroSys'25

Journal ref: Proceedings of the Twentieth European Conference on Computer Systems, 2025, pp. 605-620

arXiv:2504.10076 [pdf, other]

Towards Scalable Bayesian Optimization via Gradient-Informed Bayesian Neural Networks

Authors: Georgios Makrygiorgos, Joshua Hang Sai Ip, Ali Mesbah

Abstract: Bayesian optimization (BO) is a widely used method for data-driven optimization that generally relies on zeroth-order data of objective function to construct probabilistic surrogate models. These surrogates guide the exploration-exploitation process toward finding global optimum. While Gaussian processes (GPs) are commonly employed as surrogates of the unknown objective function, recent studies ha… ▽ More Bayesian optimization (BO) is a widely used method for data-driven optimization that generally relies on zeroth-order data of objective function to construct probabilistic surrogate models. These surrogates guide the exploration-exploitation process toward finding global optimum. While Gaussian processes (GPs) are commonly employed as surrogates of the unknown objective function, recent studies have highlighted the potential of Bayesian neural networks (BNNs) as scalable and flexible alternatives. Moreover, incorporating gradient observations into GPs, when available, has been shown to improve BO performance. However, the use of gradients within BNN surrogates remains unexplored. By leveraging automatic differentiation, gradient information can be seamlessly integrated into BNN training, resulting in more informative surrogates for BO. We propose a gradient-informed loss function for BNN training, effectively augmenting function observations with local gradient information. The effectiveness of this approach is demonstrated on well-known benchmarks in terms of improved BNN predictions and faster BO convergence as the number of decision variables increases. △ Less

Submitted 14 April, 2025; originally announced April 2025.

arXiv:2504.10014 [pdf, other]

Air Quality Prediction with A Meteorology-Guided Modality-Decoupled Spatio-Temporal Network

Authors: Hang Yin, Yan-Ming Zhang, Jian Xu, Jian-Long Chang, Yin Li, Cheng-Lin Liu

Abstract: Air quality prediction plays a crucial role in public health and environmental protection. Accurate air quality prediction is a complex multivariate spatiotemporal problem, that involves interactions across temporal patterns, pollutant correlations, spatial station dependencies, and particularly meteorological influences that govern pollutant dispersion and chemical transformations. Existing works… ▽ More Air quality prediction plays a crucial role in public health and environmental protection. Accurate air quality prediction is a complex multivariate spatiotemporal problem, that involves interactions across temporal patterns, pollutant correlations, spatial station dependencies, and particularly meteorological influences that govern pollutant dispersion and chemical transformations. Existing works underestimate the critical role of atmospheric conditions in air quality prediction and neglect comprehensive meteorological data utilization, thereby impairing the modeling of dynamic interdependencies between air quality and meteorological data. To overcome this, we propose MDSTNet, an encoder-decoder framework that explicitly models air quality observations and atmospheric conditions as distinct modalities, integrating multi-pressure-level meteorological data and weather forecasts to capture atmosphere-pollution dependencies for prediction. Meantime, we construct ChinaAirNet, the first nationwide dataset combining air quality records with multi-pressure-level meteorological observations. Experimental results on ChinaAirNet demonstrate MDSTNet's superiority, substantially reducing 48-hour prediction errors by 17.54\% compared to the state-of-the-art model. The source code and dataset will be available on github. △ Less

Submitted 14 April, 2025; originally announced April 2025.

arXiv:2504.09280 [pdf, ps, other]

Full asymptotic expansions of the Humbert function $Φ_1$

Authors: Peng-Cheng Hang, Liangjian Hu

Abstract: We derive full asymptotic expansions for the Humbert function $Φ_1$ in different limiting regimes of its variables. Our derivation employs various asymptotic methods and relies on key transformation formulae established by Erdélyi (1940), and Tuan and Kalla (1987). The efficiency of our asymptotic results are also illustrated through two applications: (1) analytic continuations of Saran's function… ▽ More We derive full asymptotic expansions for the Humbert function $Φ_1$ in different limiting regimes of its variables. Our derivation employs various asymptotic methods and relies on key transformation formulae established by Erdélyi (1940), and Tuan and Kalla (1987). The efficiency of our asymptotic results are also illustrated through two applications: (1) analytic continuations of Saran's function $F_M$, and (2) two limits arising in the study of the $1D$ Glauber-Ising model. Finally, some promising directions for future research are highlighted. △ Less

Submitted 12 April, 2025; originally announced April 2025.

Comments: 12 pages

MSC Class: 33C65; 41A60; 82C20

arXiv:2504.08694 [pdf, other]

TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning

Authors: Hang Ni, Fan Liu, Xinyu Ma, Lixin Su, Shuaiqiang Wang, Dawei Yin, Hui Xiong, Hao Liu

Abstract: Large language models (LLMs) have shown promise in automating travel planning, yet they often fall short in addressing nuanced spatiotemporal rationality. While existing benchmarks focus on basic plan validity, they neglect critical aspects such as route efficiency, POI appeal, and real-time adaptability. This paper introduces TP-RAG, the first benchmark tailored for retrieval-augmented, spatiotem… ▽ More Large language models (LLMs) have shown promise in automating travel planning, yet they often fall short in addressing nuanced spatiotemporal rationality. While existing benchmarks focus on basic plan validity, they neglect critical aspects such as route efficiency, POI appeal, and real-time adaptability. This paper introduces TP-RAG, the first benchmark tailored for retrieval-augmented, spatiotemporal-aware travel planning. Our dataset includes 2,348 real-world travel queries, 85,575 fine-grain annotated POIs, and 18,784 high-quality travel trajectory references sourced from online tourist documents, enabling dynamic and context-aware planning. Through extensive experiments, we reveal that integrating reference trajectories significantly improves spatial efficiency and POI rationality of the travel plan, while challenges persist in universality and robustness due to conflicting references and noisy data. To address these issues, we propose EvoRAG, an evolutionary framework that potently synergizes diverse retrieved trajectories with LLMs' intrinsic reasoning. EvoRAG achieves state-of-the-art performance, improving spatiotemporal compliance and reducing commonsense violation compared to ground-up and retrieval-augmented baselines. Our work underscores the potential of hybridizing Web knowledge with LLM-driven optimization, paving the way for more reliable and adaptive travel planning agents. △ Less

Submitted 11 April, 2025; originally announced April 2025.

arXiv:2504.08672 [pdf, other]

Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning

Authors: Fangzhi Xu, Hang Yan, Chang Ma, Haiteng Zhao, Qiushi Sun, Kanzhi Cheng, Junxian He, Jun Liu, Zhiyong Wu

Abstract: Advancing LLM reasoning skills has captivated wide interest. However, current post-training techniques rely heavily on supervisory signals, such as outcome supervision or auxiliary reward models, which face the problem of scalability and high annotation costs. This motivates us to enhance LLM reasoning without the need for external supervision. We introduce a generalizable and purely unsupervised… ▽ More Advancing LLM reasoning skills has captivated wide interest. However, current post-training techniques rely heavily on supervisory signals, such as outcome supervision or auxiliary reward models, which face the problem of scalability and high annotation costs. This motivates us to enhance LLM reasoning without the need for external supervision. We introduce a generalizable and purely unsupervised self-training framework, named Genius. Without external auxiliary, Genius requires to seek the optimal response sequence in a stepwise manner and optimize the LLM. To explore the potential steps and exploit the optimal ones, Genius introduces a stepwise foresight re-sampling strategy to sample and estimate the step value by simulating future outcomes. Further, we recognize that the unsupervised setting inevitably induces the intrinsic noise and uncertainty. To provide a robust optimization, we propose an advantage-calibrated optimization (ACO) loss function to mitigate estimation inconsistencies. Combining these techniques together, Genius provides an advanced initial step towards self-improve LLM reasoning with general queries and without supervision, revolutionizing reasoning scaling laws given the vast availability of general queries. The code will be released at https://github.com/xufangzhi/Genius. △ Less

Submitted 11 April, 2025; originally announced April 2025.

Comments: 14 pages, 7 figures

arXiv:2504.08619 [pdf, other]

Analyzing 16,193 LLM Papers for Fun and Profits

Authors: Zhiqiu Xia, Lang Zhu, Bingzhe Li, Feng Chen, Qiannan Li, Chunhua Liao, Feiyi Wang, Hang Liu

Abstract: Large Language Models (LLMs) are reshaping the landscape of computer science research, driving significant shifts in research priorities across diverse conferences and fields. This study provides a comprehensive analysis of the publication trend of LLM-related papers in 77 top-tier computer science conferences over the past six years (2019-2024). We approach this analysis from four distinct perspe… ▽ More Large Language Models (LLMs) are reshaping the landscape of computer science research, driving significant shifts in research priorities across diverse conferences and fields. This study provides a comprehensive analysis of the publication trend of LLM-related papers in 77 top-tier computer science conferences over the past six years (2019-2024). We approach this analysis from four distinct perspectives: (1) We investigate how LLM research is driving topic shifts within major conferences. (2) We adopt a topic modeling approach to identify various areas of LLM-related topic growth and reveal the topics of concern at different conferences. (3) We explore distinct contribution patterns of academic and industrial institutions. (4) We study the influence of national origins on LLM development trajectories. Synthesizing the findings from these diverse analytical angles, we derive ten key insights that illuminate the dynamics and evolution of the LLM research ecosystem. △ Less

Submitted 22 April, 2025; v1 submitted 11 April, 2025; originally announced April 2025.

arXiv:2504.08334 [pdf, other]

Efficient Architecture for RISC-V Vector Memory Access

Authors: Hongyi Guan, Yichuan Gao, Chenlu Miao, Haoyang Wu, Hang Zhu, Mingfeng Lin, Huayue Liang

Abstract: Vector processors frequently suffer from inefficient memory accesses, particularly for strided and segment patterns. While coalescing strided accesses is a natural solution, effectively gathering or scattering elements at fixed strides remains challenging. Naive approaches rely on high-overhead crossbars that remap any byte between memory and registers, leading to physical design issues. Segment o… ▽ More Vector processors frequently suffer from inefficient memory accesses, particularly for strided and segment patterns. While coalescing strided accesses is a natural solution, effectively gathering or scattering elements at fixed strides remains challenging. Naive approaches rely on high-overhead crossbars that remap any byte between memory and registers, leading to physical design issues. Segment operations require row-column transpositions, typically handled using either element-level in-place transposition (degrading performance) or large buffer-based bulk transposition (incurring high area overhead). In this paper, we present EARTH, a novel vector memory access architecture designed to overcome these challenges through shifting-based optimizations. For strided accesses, EARTH integrates specialized shift networks for gathering and scattering elements. After coalescing multiple accesses within the same cache line, data is routed between memory and registers through the shifting network with minimal overhead. For segment operations, EARTH employs a shifted register bank enabling direct column-wise access, eliminating dedicated segment buffers while providing high-performance, in-place bulk transposition. Implemented on FPGA with Chisel HDL based on an open-source RISC-V vector unit, EARTH enhances performance for strided memory accesses, achieving 4x-8x speedups in benchmarks dominated by strided operations. Compared to conventional designs, EARTH reduces hardware area by 9% and power consumption by 41%, significantly advancing both performance and efficiency of vector processors. △ Less

Submitted 16 April, 2025; v1 submitted 11 April, 2025; originally announced April 2025.

arXiv:2504.06330 [pdf, other]

Analyzing the Impact of Low-Rank Adaptation for Cross-Domain Few-Shot Object Detection in Aerial Images

Authors: Hicham Talaoubrid, Anissa Mokraoui, Ismail Ben Ayed, Axel Prouvost, Sonimith Hang, Monit Korn, Rémi Harvey

Abstract: This paper investigates the application of Low-Rank Adaptation (LoRA) to small models for cross-domain few-shot object detection in aerial images. Originally designed for large-scale models, LoRA helps mitigate overfitting, making it a promising approach for resource-constrained settings. We integrate LoRA into DiffusionDet, and evaluate its performance on the DOTA and DIOR datasets. Our results s… ▽ More This paper investigates the application of Low-Rank Adaptation (LoRA) to small models for cross-domain few-shot object detection in aerial images. Originally designed for large-scale models, LoRA helps mitigate overfitting, making it a promising approach for resource-constrained settings. We integrate LoRA into DiffusionDet, and evaluate its performance on the DOTA and DIOR datasets. Our results show that LoRA applied after an initial fine-tuning slightly improves performance in low-shot settings (e.g., 1-shot and 5-shot), while full fine-tuning remains more effective in higher-shot configurations. These findings highlight LoRA's potential for efficient adaptation in aerial object detection, encouraging further research into parameter-efficient fine-tuning strategies for few-shot learning. Our code is available here: https://github.com/HichTala/LoRA-DiffusionDet. △ Less

Submitted 8 April, 2025; originally announced April 2025.

arXiv:2504.05810 [pdf, other]

PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning

Authors: Xinpeng Ding, Kui Zhang, Jianhua Han, Lanqing Hong, Hang Xu, Xiaomeng Li

Abstract: Direct Preference Optimization (DPO) helps reduce hallucinations in Video Multimodal Large Language Models (VLLMs), but its reliance on offline preference data limits adaptability and fails to capture true video-response misalignment. We propose Video Direct Preference Optimization (VDPO), an online preference learning framework that eliminates the need for preference annotation by leveraging vide… ▽ More Direct Preference Optimization (DPO) helps reduce hallucinations in Video Multimodal Large Language Models (VLLMs), but its reliance on offline preference data limits adaptability and fails to capture true video-response misalignment. We propose Video Direct Preference Optimization (VDPO), an online preference learning framework that eliminates the need for preference annotation by leveraging video augmentations to generate rejected samples while keeping responses fixed. However, selecting effective augmentations is non-trivial, as some clips may be semantically identical to the original under specific prompts, leading to false rejections and disrupting alignment. To address this, we introduce Prompt-aware Multi-instance Learning VDPO (PaMi-VDPO), which selects augmentations based on prompt context. Instead of a single rejection, we construct a candidate set of augmented clips and apply a close-to-far selection strategy, initially ensuring all clips are semantically relevant while then prioritizing the most prompt-aware distinct clip. This allows the model to better capture meaningful visual differences, mitigating hallucinations, while avoiding false rejections, and improving alignment. PaMi-VDPOseamlessly integrates into existing VLLMs without additional parameters, GPT-4/human supervision. With only 10k SFT data, it improves the base model by 5.3% on VideoHallucer, surpassing GPT-4o, while maintaining stable performance on general video benchmarks. △ Less

Submitted 15 April, 2025; v1 submitted 8 April, 2025; originally announced April 2025.

arXiv:2504.05541 [pdf, other]

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting

Authors: Yunlong Tang, Jing Bi, Chao Huang, Susan Liang, Daiki Shimada, Hang Hua, Yunzhong Xiao, Yizhi Song, Pinxin Liu, Mingqian Feng, Junjia Guo, Zhuo Liu, Luchuan Song, Ali Vosoughi, Jinxi He, Liu He, Zeliang Zhang, Jiebo Luo, Chenliang Xu

Abstract: We present CAT-V (Caption AnyThing in Video), a training-free framework for fine-grained object-centric video captioning that enables detailed descriptions of user-selected objects through time. CAT-V integrates three key components: a Segmenter based on SAMURAI for precise object segmentation across frames, a Temporal Analyzer powered by TRACE-Uni for accurate event boundary detection and tempora… ▽ More We present CAT-V (Caption AnyThing in Video), a training-free framework for fine-grained object-centric video captioning that enables detailed descriptions of user-selected objects through time. CAT-V integrates three key components: a Segmenter based on SAMURAI for precise object segmentation across frames, a Temporal Analyzer powered by TRACE-Uni for accurate event boundary detection and temporal analysis, and a Captioner using InternVL-2.5 for generating detailed object-centric descriptions. Through spatiotemporal visual prompts and chain-of-thought reasoning, our framework generates detailed, temporally-aware descriptions of objects' attributes, actions, statuses, interactions, and environmental contexts without requiring additional training data. CAT-V supports flexible user interactions through various visual prompts (points, bounding boxes, and irregular regions) and maintains temporal sensitivity by tracking object states and interactions across different time segments. Our approach addresses limitations of existing video captioning methods, which either produce overly abstract descriptions or lack object-level precision, enabling fine-grained, object-specific descriptions while maintaining temporal coherence and spatial accuracy. The GitHub repository for this project is available at https://github.com/yunlong10/CAT-V △ Less

Submitted 8 April, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

arXiv:2504.05276 [pdf, ps, other]

Enhancing LLM-Based Short Answer Grading with Retrieval-Augmented Generation

Authors: Yucheng Chu, Peng He, Hang Li, Haoyu Han, Kaiqi Yang, Yu Xue, Tingting Li, Joseph Krajcik, Jiliang Tang

Abstract: Short answer assessment is a vital component of science education, allowing evaluation of students' complex three-dimensional understanding. Large language models (LLMs) that possess human-like ability in linguistic tasks are increasingly popular in assisting human graders to reduce their workload. However, LLMs' limitations in domain knowledge restrict their understanding in task-specific require… ▽ More Short answer assessment is a vital component of science education, allowing evaluation of students' complex three-dimensional understanding. Large language models (LLMs) that possess human-like ability in linguistic tasks are increasingly popular in assisting human graders to reduce their workload. However, LLMs' limitations in domain knowledge restrict their understanding in task-specific requirements and hinder their ability to achieve satisfactory performance. Retrieval-augmented generation (RAG) emerges as a promising solution by enabling LLMs to access relevant domain-specific knowledge during assessment. In this work, we propose an adaptive RAG framework for automated grading that dynamically retrieves and incorporates domain-specific knowledge based on the question and student answer context. Our approach combines semantic search and curated educational sources to retrieve valuable reference materials. Experimental results in a science education dataset demonstrate that our system achieves an improvement in grading accuracy compared to baseline LLM approaches. The findings suggest that RAG-enhanced grading systems can serve as reliable support with efficient performance gains. △ Less

Submitted 3 June, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

Comments: EDM 2025 Short Paper

arXiv:2504.05239 [pdf, other]

LLM-based Automated Grading with Human-in-the-Loop

Authors: Hang Li, Yucheng Chu, Kaiqi Yang, Yasemin Copur-Gencturk, Jiliang Tang

Abstract: The rise of artificial intelligence (AI) technologies, particularly large language models (LLMs), has brought significant advancements to the field of education. Among various applications, automatic short answer grading (ASAG), which focuses on evaluating open-ended textual responses, has seen remarkable progress with the introduction of LLMs. These models not only enhance grading performance com… ▽ More The rise of artificial intelligence (AI) technologies, particularly large language models (LLMs), has brought significant advancements to the field of education. Among various applications, automatic short answer grading (ASAG), which focuses on evaluating open-ended textual responses, has seen remarkable progress with the introduction of LLMs. These models not only enhance grading performance compared to traditional ASAG approaches but also move beyond simple comparisons with predefined "golden" answers, enabling more sophisticated grading scenarios, such as rubric-based evaluation. However, existing LLM-powered methods still face challenges in achieving human-level grading performance in rubric-based assessments due to their reliance on fully automated approaches. In this work, we explore the potential of LLMs in ASAG tasks by leveraging their interactive capabilities through a human-in-the-loop (HITL) approach. Our proposed framework, GradeHITL, utilizes the generative properties of LLMs to pose questions to human experts, incorporating their insights to refine grading rubrics dynamically. This adaptive process significantly improves grading accuracy, outperforming existing methods and bringing ASAG closer to human-level evaluation. △ Less

Submitted 28 April, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

arXiv:2504.05199 [pdf, ps, other]

Equivalence Theorems and Double-Copy Structure in Scattering Amplitudes of Massive Kaluza-Klein States with Matter Interactions

Authors: Kezhu Guo, Yanfeng Hang

Abstract: We investigate the scattering amplitudes of massive Kaluza-Klein (KK) states in compactified five-dimensional warped gauge and gravity theories. Focusing on tree-level $2\to2$ processes, we analyze the leading-order amplitudes involving bulk KK matter fields and KK gauge/gravitational Goldstone bosons. By imposing the gauge theory equivalence theorem (GAET) and the gravitational equivalence theore… ▽ More We investigate the scattering amplitudes of massive Kaluza-Klein (KK) states in compactified five-dimensional warped gauge and gravity theories. Focusing on tree-level $2\to2$ processes, we analyze the leading-order amplitudes involving bulk KK matter fields and KK gauge/gravitational Goldstone bosons. By imposing the gauge theory equivalence theorem (GAET) and the gravitational equivalence theorem (GRET) within warped KK theories, we systematically reconstruct the leading-order amplitudes for physical KK gauge bosons and gravitons, thereby circumventing the intricate energy cancellations inherent in physical amplitudes. Within this framework, the correspondence between GAET and GRET arises as a direct manifestation of the leading-order double-copy relation in the high-energy expansion. This connection provides a foundation for extending the BCJ double-copy construction to four-point amplitudes involving bulk KK matter fields, and further generalizes to arbitrary $N$-point cases, enabling a systematic derivation of the corresponding gravitational amplitudes with consistent incorporation of KK matter fields at leading order. △ Less

Submitted 14 May, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

Comments: 31 pages. Incorporating general N-point discussion and new references. The typos have been corrected and the conclusion remains unchanged

arXiv:2504.05118 [pdf, other]

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

Authors: Yu Yue, Yufeng Yuan, Qiying Yu, Xiaochen Zuo, Ruofei Zhu, Wenyuan Xu, Jiaze Chen, Chengyi Wang, TianTian Fan, Zhengyin Du, Xiangpeng Wei, Xiangyu Yu, Gaohong Liu, Juncai Liu, Lingjun Liu, Haibin Lin, Zhiqi Lin, Bole Ma, Chi Zhang, Mofan Zhang, Wang Zhang, Hang Zhu, Ru Zhang, Xin Liu, Mingxuan Wang , et al. (2 additional authors not shown)

Abstract: We present VAPO, Value-based Augmented Proximal Policy Optimization framework for reasoning models., a novel framework tailored for reasoning models within the value-based paradigm. Benchmarked the AIME 2024 dataset, VAPO, built on the Qwen 32B pre-trained model, attains a state-of-the-art score of $\mathbf{60.4}$. In direct comparison under identical experimental settings, VAPO outperforms the pr… ▽ More We present VAPO, Value-based Augmented Proximal Policy Optimization framework for reasoning models., a novel framework tailored for reasoning models within the value-based paradigm. Benchmarked the AIME 2024 dataset, VAPO, built on the Qwen 32B pre-trained model, attains a state-of-the-art score of $\mathbf{60.4}$. In direct comparison under identical experimental settings, VAPO outperforms the previously reported results of DeepSeek-R1-Zero-Qwen-32B and DAPO by more than 10 points. The training process of VAPO stands out for its stability and efficiency. It reaches state-of-the-art performance within a mere 5,000 steps. Moreover, across multiple independent runs, no training crashes occur, underscoring its reliability. This research delves into long chain-of-thought (long-CoT) reasoning using a value-based reinforcement learning framework. We pinpoint three key challenges that plague value-based methods: value model bias, the presence of heterogeneous sequence lengths, and the sparsity of reward signals. Through systematic design, VAPO offers an integrated solution that effectively alleviates these challenges, enabling enhanced performance in long-CoT reasoning tasks. △ Less

Submitted 10 April, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

arXiv:2504.05041 [pdf, ps, other]

Segmented Trajectory Optimization for Autonomous Parking in Unstructured Environments

Authors: Hang Yu, Renjie Li

Abstract: This paper presents a Segmented Trajectory Optimization (STO) method for autonomous parking, which refines an initial trajectory into a dynamically feasible and collision-free one using an iterative SQP-based approach. STO maintains the maneuver strategy of the high-level global planner while allowing curvature discontinuities at switching points to improve maneuver efficiency. To ensure safety, a… ▽ More This paper presents a Segmented Trajectory Optimization (STO) method for autonomous parking, which refines an initial trajectory into a dynamically feasible and collision-free one using an iterative SQP-based approach. STO maintains the maneuver strategy of the high-level global planner while allowing curvature discontinuities at switching points to improve maneuver efficiency. To ensure safety, a convex corridor is constructed via GJK-accelerated ellipse shrinking and expansion, serving as safety constraints in each iteration. Numerical simulations in perpendicular and reverse-angled parking scenarios demonstrate that STO enhances maneuver efficiency while ensuring safety. Moreover, computational performance confirms its practicality for real-world applications. △ Less

Submitted 7 April, 2025; originally announced April 2025.

Comments: 8 pages, 6 figures, submitted to IROS 2025

arXiv:2504.04748 [pdf, other]

Sharp threshold for network recovery from voter model dynamics

Authors: Hang Du, Seokmin Ha, Oriol Solé-Pi

Abstract: We investigate the problem of recovering a latent directed Erdős-Rényi graph $G^*\sim \mathcal G(n,p)$ from observations of discrete voter model trajectories on $G^*$, where $np$ grows polynomially in $n$. Given access to $M$ independent voter model trajectories evolving up to time $T$, we establish that $G^*$ can be recovered \emph{exactly} with probability at least $0.9$ by an \emph{efficient} a… ▽ More We investigate the problem of recovering a latent directed Erdős-Rényi graph $G^*\sim \mathcal G(n,p)$ from observations of discrete voter model trajectories on $G^*$, where $np$ grows polynomially in $n$. Given access to $M$ independent voter model trajectories evolving up to time $T$, we establish that $G^*$ can be recovered \emph{exactly} with probability at least $0.9$ by an \emph{efficient} algorithm, provided that \[ M \cdot \min\{T, n\} \geq C n^2 p^2 \log n \] holds for a sufficiently large constant $C$. Here, $M\cdot \min\{T,n\}$ can be interpreted as the approximate number of effective update rounds being observed, since the voter model on $G^*$ typically reaches consensus after $Θ(n)$ rounds, and no further information can be gained after this point. Furthermore, we prove an \emph{information-theoretic} lower bound showing that the above condition is tight up to a constant factor. Our results indicate that the recovery problem does not exhibit a statistical-computational gap. △ Less

Submitted 7 April, 2025; originally announced April 2025.

Comments: 58 pages, 3 figures

MSC Class: 60K35; 68Q87

arXiv:2504.04598 [pdf, other]

B4P: Simultaneous Grasp and Motion Planning for Object Placement via Parallelized Bidirectional Forests and Path Repair

Authors: Benjamin H. Leebron, Kejia Ren, Yiting Chen, Kaiyu Hang

Abstract: Robot pick and place systems have traditionally decoupled grasp, placement, and motion planning to build sequential optimization pipelines with the assumption that the individual components will be able to work together. However, this separation introduces sub-optimality, as grasp choices may limit or even prohibit feasible motions for a robot to reach the target placement pose, particularly in cl… ▽ More Robot pick and place systems have traditionally decoupled grasp, placement, and motion planning to build sequential optimization pipelines with the assumption that the individual components will be able to work together. However, this separation introduces sub-optimality, as grasp choices may limit or even prohibit feasible motions for a robot to reach the target placement pose, particularly in cluttered environments with narrow passages. To this end, we propose a forest-based planning framework to simultaneously find grasp configurations and feasible robot motions that explicitly satisfy downstream placement configurations paired with the selected grasps. Our proposed framework leverages a bidirectional sampling-based approach to build a start forest, rooted at the feasible grasp regions, and a goal forest, rooted at the feasible placement regions, to facilitate the search through randomly explored motions that connect valid pairs of grasp and placement trees. We demonstrate that the framework's inherent parallelism enables superlinear speedup, making it scalable for applications for redundant robot arms (e.g., 7 Degrees of Freedom) to work efficiently in highly cluttered environments. Extensive experiments in simulation demonstrate the robustness and efficiency of the proposed framework in comparison with multiple baselines under diverse scenarios. △ Less

Submitted 6 April, 2025; originally announced April 2025.

arXiv:2504.04421 [pdf, other]

Deliberate Planning of 3D Bin Packing on Packing Configuration Trees

Authors: Hang Zhao, Juzhan Xu, Kexiong Yu, Ruizhen Hu, Chenyang Zhu, Kai Xu

Abstract: Online 3D Bin Packing Problem (3D-BPP) has widespread applications in industrial automation. Existing methods usually solve the problem with limited resolution of spatial discretization, and/or cannot deal with complex practical constraints well. We propose to enhance the practical applicability of online 3D-BPP via learning on a novel hierarchical representation, packing configuration tree (PCT).… ▽ More Online 3D Bin Packing Problem (3D-BPP) has widespread applications in industrial automation. Existing methods usually solve the problem with limited resolution of spatial discretization, and/or cannot deal with complex practical constraints well. We propose to enhance the practical applicability of online 3D-BPP via learning on a novel hierarchical representation, packing configuration tree (PCT). PCT is a full-fledged description of the state and action space of bin packing which can support packing policy learning based on deep reinforcement learning (DRL). The size of the packing action space is proportional to the number of leaf nodes, making the DRL model easy to train and well-performing even with continuous solution space. We further discover the potential of PCT as tree-based planners in deliberately solving packing problems of industrial significance, including large-scale packing and different variations of BPP setting. A recursive packing method is proposed to decompose large-scale packing into smaller sub-trees while a spatial ensemble mechanism integrates local solutions into global. For different BPP variations with additional decision variables, such as lookahead, buffering, and offline packing, we propose a unified planning framework enabling out-of-the-box problem solving. Extensive evaluations demonstrate that our method outperforms existing online BPP baselines and is versatile in incorporating various practical constraints. The planning process excels across large-scale problems and diverse problem variations. We develop a real-world packing robot for industrial warehousing, with careful designs accounting for constrained placement and transportation stability. Our packing robot operates reliably and efficiently on unprotected pallets at 10 seconds per box. It achieves averagely 19 boxes per pallet with 57.4% space utilization for relatively large-size boxes. △ Less

Submitted 29 April, 2025; v1 submitted 6 April, 2025; originally announced April 2025.

arXiv:2504.04381 [pdf, other]

Error analysis of a Euler finite element scheme for Natural convection model with variable density

Authors: Li Hang, Chenyang Li

Abstract: In this paper, we derive first-order Euler finite element discretization schemes for a time-dependent natural convection model with variable density (NCVD). The model is governed by the variable density Navier-Stokes equations coupled with a parabolic partial differential equation that describes the evolution of temperature. Stability and error estimate for the velocity, pressure, density and temp… ▽ More In this paper, we derive first-order Euler finite element discretization schemes for a time-dependent natural convection model with variable density (NCVD). The model is governed by the variable density Navier-Stokes equations coupled with a parabolic partial differential equation that describes the evolution of temperature. Stability and error estimate for the velocity, pressure, density and temperature in $L^2$-norm are proved by using finite element approximations in space and finite differences in time. Finally, the numerical results are showed to support the theoretical analysis. △ Less

Submitted 19 May, 2025; v1 submitted 6 April, 2025; originally announced April 2025.

arXiv:2504.03687 [pdf, other]

Process Optimization and Deployment for Sensor-Based Human Activity Recognition Based on Deep Learning

Authors: Hanyu Liu, Ying Yu, Hang Xiao, Siyao Li, Xuze Li, Jiarui Li, Haotian Tang

Abstract: Sensor-based human activity recognition is a key technology for many human-centered intelligent applications. However, this research is still in its infancy and faces many unresolved challenges. To address these, we propose a comprehensive optimization process approach centered on multi-attention interaction. We first utilize unsupervised statistical feature-guided diffusion models for highly adap… ▽ More Sensor-based human activity recognition is a key technology for many human-centered intelligent applications. However, this research is still in its infancy and faces many unresolved challenges. To address these, we propose a comprehensive optimization process approach centered on multi-attention interaction. We first utilize unsupervised statistical feature-guided diffusion models for highly adaptive data enhancement, and introduce a novel network architecture-Multi-branch Spatiotemporal Interaction Network, which uses multi-branch features at different levels to effectively Sequential ), which uses multi-branch features at different levels to effectively Sequential spatio-temporal interaction to enhance the ability to mine advanced latent features. In addition, we adopt a multi-loss function fusion strategy in the training phase to dynamically adjust the fusion weights between batches to optimize the training results. Finally, we also conducted actual deployment on embedded devices to extensively test the practical feasibility of the proposed method in existing work. We conduct extensive testing on three public datasets, including ablation studies, comparisons of related work, and embedded deployments. △ Less

Submitted 22 March, 2025; originally announced April 2025.

arXiv:2504.03026 [pdf, other]

HALO: Human-Aligned End-to-end Image Retargeting with Layered Transformations

Authors: Yiran Xu, Siqi Xie, Zhuofang Li, Harris Shadmany, Yinxiao Li, Luciano Sbaiz, Miaosen Wang, Junjie Ke, Jose Lezama, Hang Qi, Han Zhang, Jesse Berent, Ming-Hsuan Yang, Irfan Essa, Jia-Bin Huang, Feng Yang

Abstract: Image retargeting aims to change the aspect-ratio of an image while maintaining its content and structure with less visual artifacts. Existing methods still generate many artifacts or fail to maintain original content or structure. To address this, we introduce HALO, an end-to-end trainable solution for image retargeting. Since humans are more sensitive to distortions in salient areas than non-sal… ▽ More Image retargeting aims to change the aspect-ratio of an image while maintaining its content and structure with less visual artifacts. Existing methods still generate many artifacts or fail to maintain original content or structure. To address this, we introduce HALO, an end-to-end trainable solution for image retargeting. Since humans are more sensitive to distortions in salient areas than non-salient areas of an image, HALO decomposes the input image into salient/non-salient layers and applies different wrapping fields to different layers. To further minimize the structure distortion in the output images, we propose perceptual structure similarity loss which measures the structure similarity between input and output images and aligns with human perception. Both quantitative results and a user study on the RetargetMe dataset show that HALO achieves SOTA. Especially, our method achieves an 18.4% higher user preference compared to the baselines on average. △ Less

Submitted 3 April, 2025; originally announced April 2025.

arXiv:2504.02296 [pdf, other]

Exceedance and force of centrality for functional data

Authors: Poorbita Kundu, Hang Zhou, Hans-Georg Müller

Abstract: Exceedance refers to instances where a dynamic process surpasses given thresholds, e.g., the occurrence of a heat wave. We propose a novel exceedance framework for functional data, where each observed random trajectory is transformed into an exceedance function, which quantifies exceedance durations as a function of threshold levels. An inherent relationship between exceedance functions and probab… ▽ More Exceedance refers to instances where a dynamic process surpasses given thresholds, e.g., the occurrence of a heat wave. We propose a novel exceedance framework for functional data, where each observed random trajectory is transformed into an exceedance function, which quantifies exceedance durations as a function of threshold levels. An inherent relationship between exceedance functions and probability distributions makes it possible to draw on distributional data analysis techniques such as Fréchet regression to study the dependence of exceedances on Euclidean predictors, e.g., calendar year when the exceedances are observed. We use local linear estimators to obtain exceedance functions from discretely observed functional data with noise and study the convergence of the proposed estimators. New concepts of interest include the force of centrality that quantifies the propensity of a system to revert to lower levels when a given threshold has been exceeded, conditional exceedance functions when conditioning on Euclidean covariates, and threshold exceedance functions, which characterize the size of exceedance sets in dependence on covariates for any fixed threshold. We establish consistent estimation with rates of convergence for these targets. The practical merits of the proposed methodology are illustrated through simulations and applications for annual temperature curves and medfly activity profiles. △ Less

Submitted 3 April, 2025; originally announced April 2025.

arXiv:2504.01934 [pdf, other]

ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement

Authors: Runhui Huang, Chunwei Wang, Junwei Yang, Guansong Lu, Yunlong Yuan, Jianhua Han, Lu Hou, Wei Zhang, Lanqing Hong, Hengshuang Zhao, Hang Xu

Abstract: We present ILLUME+ that leverages dual visual tokenization and a diffusion decoder to improve both deep semantic understanding and high-fidelity image generation. Existing unified models have struggled to simultaneously handle the three fundamental capabilities in a unified model: understanding, generation, and editing. Models like Chameleon and EMU3 utilize VQGAN for image discretization, due to… ▽ More We present ILLUME+ that leverages dual visual tokenization and a diffusion decoder to improve both deep semantic understanding and high-fidelity image generation. Existing unified models have struggled to simultaneously handle the three fundamental capabilities in a unified model: understanding, generation, and editing. Models like Chameleon and EMU3 utilize VQGAN for image discretization, due to the lack of deep semantic interaction, they lag behind specialist models like LLaVA in visual understanding tasks. To mitigate this, LaViT and ILLUME employ semantic encoders for tokenization, but they struggle with image editing due to poor texture preservation. Meanwhile, Janus series decouples the input and output image representation, limiting their abilities to seamlessly handle interleaved image-text understanding and generation. In contrast, ILLUME+ introduces a unified dual visual tokenizer, DualViTok, which preserves both fine-grained textures and text-aligned semantics while enabling a coarse-to-fine image representation strategy for multimodal understanding and generation. Additionally, we employ a diffusion model as the image detokenizer for enhanced generation quality and efficient super-resolution. ILLUME+ follows a continuous-input, discrete-output scheme within the unified MLLM and adopts a progressive training procedure that supports dynamic resolution across the vision tokenizer, MLLM, and diffusion decoder. This design allows for flexible and efficient context-aware image editing and generation across diverse tasks. ILLUME+ (3B) exhibits competitive performance against existing unified MLLMs and specialized models across multimodal understanding, generation, and editing benchmarks. With its strong performance, ILLUME+ provides a scalable and versatile foundation for future multimodal applications. Project Page: https://illume-unified-mllm.github.io/. △ Less

Submitted 3 April, 2025; v1 submitted 2 April, 2025; originally announced April 2025.

arXiv:2504.01752 [pdf, ps, other]

doi 10.1109/TMC.2025.3557838

A Two-Timescale Approach for Wireless Federated Learning with Parameter Freezing and Power Control

Authors: Jinhao Ouyang, Yuan Liu, Hang Liu

Abstract: Federated learning (FL) enables distributed devices to train a shared machine learning (ML) model collaboratively while protecting their data privacy. However, the resource-limited mobile devices suffer from intensive computation-and-communication costs of model parameters. In this paper, we observe the phenomenon that the model parameters tend to be stabilized long before convergence during train… ▽ More Federated learning (FL) enables distributed devices to train a shared machine learning (ML) model collaboratively while protecting their data privacy. However, the resource-limited mobile devices suffer from intensive computation-and-communication costs of model parameters. In this paper, we observe the phenomenon that the model parameters tend to be stabilized long before convergence during training process. Based on this observation, we propose a two-timescale FL framework by joint optimization of freezing stabilized parameters and controlling transmit power for the unstable parameters to balance the energy consumption and convergence. First, we analyze the impact of model parameter freezing and unreliable transmission on the convergence rate. Next, we formulate a two-timescale optimization problem of parameter freezing percentage and transmit power to minimize the model convergence error subject to the energy budget. To solve this problem, we decompose it into parallel sub-problems and decompose each sub-problem into two different timescales problems using the Lyapunov optimization method. The optimal parameter freezing and power control strategies are derived in an online fashion. Experimental results demonstrate the superiority of the proposed scheme compared with the benchmark schemes. △ Less

Submitted 2 April, 2025; originally announced April 2025.

Comments: 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting, republishing, or reuse in other works. This work has been accepted to IEEE Transactions on Mobile Computing

arXiv:2504.01448 [pdf, ps, other]

LLM-VPRF: Large Language Model Based Vector Pseudo Relevance Feedback

Authors: Hang Li, Shengyao Zhuang, Bevan Koopman, Guido Zuccon

Abstract: Vector Pseudo Relevance Feedback (VPRF) has shown promising results in improving BERT-based dense retrieval systems through iterative refinement of query representations. This paper investigates the generalizability of VPRF to Large Language Model (LLM) based dense retrievers. We introduce LLM-VPRF and evaluate its effectiveness across multiple benchmark datasets, analyzing how different LLMs impa… ▽ More Vector Pseudo Relevance Feedback (VPRF) has shown promising results in improving BERT-based dense retrieval systems through iterative refinement of query representations. This paper investigates the generalizability of VPRF to Large Language Model (LLM) based dense retrievers. We introduce LLM-VPRF and evaluate its effectiveness across multiple benchmark datasets, analyzing how different LLMs impact the feedback mechanism. Our results demonstrate that VPRF's benefits successfully extend to LLM architectures, establishing it as a robust technique for enhancing dense retrieval performance regardless of the underlying models. This work bridges the gap between VPRF with traditional BERT-based dense retrievers and modern LLMs, while providing insights into their future directions. △ Less

Submitted 2 April, 2025; originally announced April 2025.

arXiv:2504.01391 [pdf]

Enabling Continuous THz Band Coverage via Precise Electron Beam Tailoring in Free-electron Lasers

Authors: Yin Kang, Tong Li, Zhen Wang, Yue Wang, Cheng Yu, Weiyi Yin, Zhangfeng Gao, Hanghua Xu, Hang Luo, Xiaofan Wang, Jian Chen, Taihe Lan, Xiaoqing Liu, Jinguo Wang, Huan Zhao, Fei Gao, Liping Sun, YanYan Zhu, Yongmei Wen, Qili Tian, Chenye Xu, Xingtao Wang, Jiaqiang Xu, Zheng Qi, Tao Liu , et al. (6 additional authors not shown)

Abstract: High-power, continuously tunable narrowband terahertz (THz) sources are essential for advancing nonlinear optics, THz-driven material dynamics, and ultrafast spectroscopy. Conventional techniques typically impose a trade-off between pulse energy and frequency tunability. Here, we introduce a novel free-electron laser approach that overcomes these limitations by pre-modulating a relativistic electr… ▽ More High-power, continuously tunable narrowband terahertz (THz) sources are essential for advancing nonlinear optics, THz-driven material dynamics, and ultrafast spectroscopy. Conventional techniques typically impose a trade-off between pulse energy and frequency tunability. Here, we introduce a novel free-electron laser approach that overcomes these limitations by pre-modulating a relativistic electron beam with a frequency-beating laser pulse and leveraging bunch compression along with collective effects to enhance microbunching. Experimental results demonstrate that this technique generates narrowband THz emission with continuous frequency tunability from 7.8 to 30.8THz, achieving pulse energies up to 385μJ while maintaining spectral bandwidths between 7.7% and 14.7%. Moreover, the method exhibits exceptional robustness and scalability, highlighting its unique ability to bridge the long-standing THz gap and offering a promising solution for diverse cutting-edge scientific applications. △ Less

Submitted 2 April, 2025; originally announced April 2025.

arXiv:2504.00562 [pdf, other]

Diffusion Model-Based Size Variable Virtual Try-On Technology and Evaluation Method

Authors: Shufang Zhang, Hang Qian, Minxue Ni, Yaxuan Li, Wenxin Ding, Jun Liu

Abstract: With the rapid development of e-commerce, virtual try-on technology has become an essential tool to satisfy consumers' personalized clothing preferences. Diffusion-based virtual try-on systems aim to naturally align garments with target individuals, generating realistic and detailed try-on images. However, existing methods overlook the importance of garment size variations in meeting personalized… ▽ More With the rapid development of e-commerce, virtual try-on technology has become an essential tool to satisfy consumers' personalized clothing preferences. Diffusion-based virtual try-on systems aim to naturally align garments with target individuals, generating realistic and detailed try-on images. However, existing methods overlook the importance of garment size variations in meeting personalized consumer needs. To address this, we propose a novel virtual try-on method named SV-VTON, which introduces garment sizing concepts into virtual try-on tasks. The SV-VTON method first generates refined masks for multiple garment sizes, then integrates these masks with garment images at varying proportions, enabling virtual try-on simulations across different sizes. In addition, we developed a specialized size evaluation module to quantitatively assess the accuracy of size variations. This module calculates differences between generated size increments and international sizing standards, providing objective measurements of size accuracy. To further validate SV-VTON's generalization capability across different models, we conducted experiments on multiple SOTA Diffusion models. The results demonstrate that SV-VTON consistently achieves precise multi-size virtual try-on across various SOTA models, and validates the effectiveness and rationality of the proposed method, significantly fulfilling users' personalized multi-size virtual try-on requirements. △ Less

Submitted 1 April, 2025; originally announced April 2025.

arXiv:2504.00521 [pdf, other]

Automated detection of atomicity violations in large-scale systems

Authors: Hang He, Yixing Luo, Chengcheng Wan, Ting Su, Haiying Sun, Geguang Pu

Abstract: Atomicity violations in interrupt-driven programs pose a significant threat to software safety in critical systems. These violations occur when the execution sequence of operations on shared resources is disrupted by asynchronous interrupts. Detecting atomicity violations is challenging due to the vast program state space, application-level code dependencies, and complex domain-specific knowledge.… ▽ More Atomicity violations in interrupt-driven programs pose a significant threat to software safety in critical systems. These violations occur when the execution sequence of operations on shared resources is disrupted by asynchronous interrupts. Detecting atomicity violations is challenging due to the vast program state space, application-level code dependencies, and complex domain-specific knowledge. We propose Clover, a hybrid framework that integrates static analysis with large language model (LLM) agents to detect atomicity violations in real-world programs. Clover first performs static analysis to extract critical code snippets and operation information. It then initiates a multi-agent process, where the expert agent leverages domain-specific knowledge to detect atomicity violations, which are subsequently validated by the judge agent. Evaluations on RaceBench 2.1, SV-COMP, and RWIP demonstrate that Clover achieves a precision/recall of 92.3%/86.6%, outperforming existing approaches by 27.4-118.2% on F1-score. △ Less

Submitted 1 April, 2025; originally announced April 2025.

arXiv:2503.24328 [pdf]

doi 10.11896/jsjkx.190600152

Contextual Preference Collaborative Measure Framework Based on Belief System

Authors: Hang Yu, Wei Wei, Zheng Tan, Jing-lei Liu

Abstract: To reduce the human intervention in the preference measure process,this article proposes a preference collaborative measure framework based on an updated belief system,which is also capable of improving the accuracy and efficiency of preferen-ce measure algorithms.Firstly,the distance of rules and the average internal distance of rulesets are proposed for specifying the relationship between the ru… ▽ More To reduce the human intervention in the preference measure process,this article proposes a preference collaborative measure framework based on an updated belief system,which is also capable of improving the accuracy and efficiency of preferen-ce measure algorithms.Firstly,the distance of rules and the average internal distance of rulesets are proposed for specifying the relationship between the rules.For discovering the most representative preferences that are common in all users,namely common preference,a algorithm based on average internal distance of ruleset,PRA algorithm,is proposed,which aims to finish the discoveryprocess with minimum information loss rate.Furthermore,the concept of Common belief is proposed to update the belief system,and the common preferences are the evidences of updated belief system.Then,under the belief system,the proposed belief degree and deviation degree are used to determine whether a rule confirms the belief system or not and classify the preference rules into two kinds(generalized or personalized),and eventually filters out Top-K interesting rules relying on belief degree and deviation degree.Based on above,a scalable interestingness calculation framework that can apply various formulas is proposed for accurately calculating interestingness in different conditions.At last,IMCos algorithm and IMCov algorithm are proposed as exemplars to verify the accuracy and efficiency of the framework by using weighted cosine similarity and correlation coefficients as belief degree.In experiments,the proposed algorithms are compared to two state-of-the-art algorithms and the results show that IMCos and IMCov outperform than the other two in most aspects. △ Less

Submitted 31 March, 2025; originally announced March 2025.

Comments: in Chinese language

arXiv:2503.23496 [pdf, other]

FlexMem: High-Parallel Near-Memory Architecture for Flexible Dataflow in Fully Homomorphic Encryption

Authors: Shangyi Shi, Husheng Han, Jianan Mu, Xinyao Zheng, Ling Liang, Hang Lu, Zidong Du, Xiaowei Li, Xing Hu, Qi Guo

Abstract: Fully Homomorphic Encryption (FHE) imposes substantial memory bandwidth demands, presenting significant challenges for efficient hardware acceleration. Near-memory Processing (NMP) has emerged as a promising architectural solution to alleviate the memory bottleneck. However, the irregular memory access patterns and flexible dataflows inherent to FHE limit the effectiveness of existing NMP accelera… ▽ More Fully Homomorphic Encryption (FHE) imposes substantial memory bandwidth demands, presenting significant challenges for efficient hardware acceleration. Near-memory Processing (NMP) has emerged as a promising architectural solution to alleviate the memory bottleneck. However, the irregular memory access patterns and flexible dataflows inherent to FHE limit the effectiveness of existing NMP accelerators, which fail to fully utilize the available near-memory bandwidth. In this work, we propose FlexMem, a near-memory accelerator featuring high-parallel computational units with varying memory access strides and interconnect topologies to effectively handle irregular memory access patterns. Furthermore, we design polynomial and ciphertext-level dataflows to efficiently utilize near-memory bandwidth under varying degrees of polynomial parallelism and enhance parallel performance. Experimental results demonstrate that FlexMem achieves 1.12 times of performance improvement over state-of-the-art near-memory architectures, with 95.7% of near-memory bandwidth utilization. △ Less

Submitted 30 March, 2025; originally announced March 2025.

Comments: 9 pages,ICCAD

arXiv:2503.23367 [pdf, other]

FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning

Authors: Hang Guo, Yawei Li, Taolin Zhang, Jiangshan Wang, Tao Dai, Shu-Tao Xia, Luca Benini

Abstract: Visual Autoregressive (VAR) modeling has gained popularity for its shift towards next-scale prediction. However, existing VAR paradigms process the entire token map at each scale step, leading to the complexity and runtime scaling dramatically with image resolution. To address this challenge, we propose FastVAR, a post-training acceleration method for efficient resolution scaling with VARs. Our ke… ▽ More Visual Autoregressive (VAR) modeling has gained popularity for its shift towards next-scale prediction. However, existing VAR paradigms process the entire token map at each scale step, leading to the complexity and runtime scaling dramatically with image resolution. To address this challenge, we propose FastVAR, a post-training acceleration method for efficient resolution scaling with VARs. Our key finding is that the majority of latency arises from the large-scale step where most tokens have already converged. Leveraging this observation, we develop the cached token pruning strategy that only forwards pivotal tokens for scale-specific modeling while using cached tokens from previous scale steps to restore the pruned slots. This significantly reduces the number of forwarded tokens and improves the efficiency at larger resolutions. Experiments show the proposed FastVAR can further speedup FlashAttention-accelerated VAR by 2.7$\times$ with negligible performance drop of <1%. We further extend FastVAR to zero-shot generation of higher resolution images. In particular, FastVAR can generate one 2K image with 15GB memory footprints in 1.5s on a single NVIDIA 3090 GPU. Code is available at https://github.com/csguoh/FastVAR. △ Less

Submitted 6 April, 2025; v1 submitted 30 March, 2025; originally announced March 2025.

Comments: Technical Report

Showing 151–200 of 3,624 results for author: Hang