-
NeoRL-2: Near Real-World Benchmarks for Offline Reinforcement Learning with Extended Realistic Scenarios
Authors:
Songyi Gao,
Zuolin Tu,
Rong-Jun Qin,
Yi-Hao Sun,
Xiong-Hui Chen,
Yang Yu
Abstract:
Offline reinforcement learning (RL) aims to learn from historical data without requiring (costly) access to the environment. To facilitate offline RL research, we previously introduced NeoRL, which highlighted that datasets from real-world tasks are often conservative and limited. With years of experience applying offline RL to various domains, we have identified additional real-world challenges.…
▽ More
Offline reinforcement learning (RL) aims to learn from historical data without requiring (costly) access to the environment. To facilitate offline RL research, we previously introduced NeoRL, which highlighted that datasets from real-world tasks are often conservative and limited. With years of experience applying offline RL to various domains, we have identified additional real-world challenges. These include extremely conservative data distributions produced by deployed control systems, delayed action effects caused by high-latency transitions, external factors arising from the uncontrollable variance of transitions, and global safety constraints that are difficult to evaluate during the decision-making process. These challenges are underrepresented in previous benchmarks but frequently occur in real-world tasks. To address this, we constructed the extended Near Real-World Offline RL Benchmark (NeoRL-2), which consists of 7 datasets from 7 simulated tasks along with their corresponding evaluation simulators. Benchmarking results from state-of-the-art offline RL approaches demonstrate that current methods often struggle to outperform the data-collection behavior policy, highlighting the need for more effective methods. We hope NeoRL-2 will accelerate the development of reinforcement learning algorithms for real-world applications. The benchmark project page is available at https://github.com/polixir/NeoRL2.
△ Less
Submitted 24 March, 2025;
originally announced March 2025.
-
A semantic communication-based workload-adjustable transceiver for wireless AI-generated content (AIGC) delivery
Authors:
Runze Cheng,
Yao Sun,
Lan Zhang,
Lei Feng,
Lei Zhang,
Muhammad Ali Imran
Abstract:
With the significant advances in generative AI (GAI) and the proliferation of mobile devices, providing high-quality AI-generated content (AIGC) services via wireless networks is becoming the future direction. However, the primary challenges of AIGC service delivery in wireless networks lie in unstable channels, limited bandwidth resources, and unevenly distributed computational resources. In this…
▽ More
With the significant advances in generative AI (GAI) and the proliferation of mobile devices, providing high-quality AI-generated content (AIGC) services via wireless networks is becoming the future direction. However, the primary challenges of AIGC service delivery in wireless networks lie in unstable channels, limited bandwidth resources, and unevenly distributed computational resources. In this paper, we employ semantic communication (SemCom) in diffusion-based GAI models to propose a Resource-aware wOrkload-adjUstable TransceivEr (ROUTE) for AIGC delivery in dynamic wireless networks. Specifically, to relieve the communication resource bottleneck, SemCom is utilized to prioritize semantic information of the generated content. Then, to improve computational resource utilization in both edge and local and reduce AIGC semantic distortion in transmission, modified diffusion-based models are applied to adjust the computing workload and semantic density in cooperative content generation. Simulations verify the superiority of our proposed ROUTE in terms of latency and content quality compared to conventional AIGC approaches.
△ Less
Submitted 24 March, 2025;
originally announced March 2025.
-
NexusGS: Sparse View Synthesis with Epipolar Depth Priors in 3D Gaussian Splatting
Authors:
Yulong Zheng,
Zicheng Jiang,
Shengfeng He,
Yandu Sun,
Junyu Dong,
Huaidong Zhang,
Yong Du
Abstract:
Neural Radiance Field (NeRF) and 3D Gaussian Splatting (3DGS) have noticeably advanced photo-realistic novel view synthesis using images from densely spaced camera viewpoints. However, these methods struggle in few-shot scenarios due to limited supervision. In this paper, we present NexusGS, a 3DGS-based approach that enhances novel view synthesis from sparse-view images by directly embedding dept…
▽ More
Neural Radiance Field (NeRF) and 3D Gaussian Splatting (3DGS) have noticeably advanced photo-realistic novel view synthesis using images from densely spaced camera viewpoints. However, these methods struggle in few-shot scenarios due to limited supervision. In this paper, we present NexusGS, a 3DGS-based approach that enhances novel view synthesis from sparse-view images by directly embedding depth information into point clouds, without relying on complex manual regularizations. Exploiting the inherent epipolar geometry of 3DGS, our method introduces a novel point cloud densification strategy that initializes 3DGS with a dense point cloud, reducing randomness in point placement while preventing over-smoothing and overfitting. Specifically, NexusGS comprises three key steps: Epipolar Depth Nexus, Flow-Resilient Depth Blending, and Flow-Filtered Depth Pruning. These steps leverage optical flow and camera poses to compute accurate depth maps, while mitigating the inaccuracies often associated with optical flow. By incorporating epipolar depth priors, NexusGS ensures reliable dense point cloud coverage and supports stable 3DGS training under sparse-view conditions. Experiments demonstrate that NexusGS significantly enhances depth accuracy and rendering quality, surpassing state-of-the-art methods by a considerable margin. Furthermore, we validate the superiority of our generated point clouds by substantially boosting the performance of competing methods. Project page: https://usmizuki.github.io/NexusGS/.
△ Less
Submitted 24 March, 2025;
originally announced March 2025.
-
Observation of the decay $ψ(3686)\rightarrow Σ^{0}\barΣ^{0}ω$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (695 additional authors not shown)
Abstract:
Using a dataset of $(27.12\pm 0.14)\times 10^{8}$ $ψ(3686)$
events collected by the BESIII detector operating at
the BEPCII collider, we report the first observation of the decay
$ψ(3686)\toΣ^{0}\barΣ^{0}ω$ with a statistical
significance of 8.9$σ$. The measured branching fraction is $(1.24 \pm 0.16_{\textrm{stat}} \pm
0.11_{\textrm{sys}}) \times 10^{-5}$, where the first
uncertainty i…
▽ More
Using a dataset of $(27.12\pm 0.14)\times 10^{8}$ $ψ(3686)$
events collected by the BESIII detector operating at
the BEPCII collider, we report the first observation of the decay
$ψ(3686)\toΣ^{0}\barΣ^{0}ω$ with a statistical
significance of 8.9$σ$. The measured branching fraction is $(1.24 \pm 0.16_{\textrm{stat}} \pm
0.11_{\textrm{sys}}) \times 10^{-5}$, where the first
uncertainty is statistical and the second is
systematic. Additionally, we investigate potential
intermediate states in the invariant mass distributions of $Σ^{0}ω$, $\barΣ^{0}ω$ and $Σ^{0}\barΣ^{0}$. A hint of a resonance is observed in the invariant mass distribution of $M_{Σ^{0}(\barΣ^{0})ω}$, located around 2.06 GeV/$c^2$, with a significance of 2.5$σ$.
△ Less
Submitted 24 March, 2025;
originally announced March 2025.
-
Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models
Authors:
Qiao Liang,
Yanjiang Liu,
Ben He,
Yaojie Lu,
Hongyu Lin,
Jia Zheng,
Xianpei Han,
Le Sun,
Yingfei Sun
Abstract:
Does the prior knowledge of the vision encoder constrain the capability boundary of Multi-modal Large Language Models (MLLMs)? While most existing research treats MLLMs as unified systems optimized through end-to-end training, the impact of vision encoder's prior knowledge is seldom investigated. In this work, we introduce a novel metric, $Rank_e$, to quantify the effect of the vision encoder's pr…
▽ More
Does the prior knowledge of the vision encoder constrain the capability boundary of Multi-modal Large Language Models (MLLMs)? While most existing research treats MLLMs as unified systems optimized through end-to-end training, the impact of vision encoder's prior knowledge is seldom investigated. In this work, we introduce a novel metric, $Rank_e$, to quantify the effect of the vision encoder's prior knowledge on MLLM performance. Our analysis reveals a positive correlation between prior knowledge and MLLM performance. Moreover, we find that domain-specific fine-tuning using solely end-to-end visual question answering (VQA) data is insufficient--particularly for entities with low inherent visual prior knowledge. To address this issue, we propose VisPRE (Vision Prior Remediation), a two-stage training framework that explicitly incorporates prior knowledge at the vision encoder level. Experimental results demonstrate that augmenting vision encoder's prior knowledge substantially boosts the visual understanding capabilities of MLLMs, offering a novel and effective strategy for improving performance, especially in scenarios involving uncommon visual entities.
△ Less
Submitted 23 March, 2025;
originally announced March 2025.
-
Cache-Aware Cooperative Multicast Beamforming in Dynamic Satellite-Terrestrial Networks
Authors:
Shuo Yuan,
Yaohua Sun,
Mugen Peng
Abstract:
With the burgeoning demand for data-intensive services, satellite-terrestrial networks (STNs) face increasing backhaul link congestion, deteriorating user quality of service (QoS), and escalating power consumption. Cache-aided STNs are acknowledged as a promising paradigm for accelerating content delivery to users and alleviating the load of backhaul links. However, the dynamic nature of low earth…
▽ More
With the burgeoning demand for data-intensive services, satellite-terrestrial networks (STNs) face increasing backhaul link congestion, deteriorating user quality of service (QoS), and escalating power consumption. Cache-aided STNs are acknowledged as a promising paradigm for accelerating content delivery to users and alleviating the load of backhaul links. However, the dynamic nature of low earth orbit (LEO) satellites and the complex interference among satellite beams and terrestrial base stations pose challenges in effectively managing limited edge resources. To address these issues, this paper proposes a method for dynamically scheduling caching and communication resources, aiming to reduce network costs in terms of transmission power consumption and backhaul traffic, while meeting user QoS demands and resource constraints. We formulate a mixed timescale problem to jointly optimize cache placement, LEO satellite beam direction, and cooperative multicast beamforming among satellite beams and base stations. To tackle this intricate problem, we propose a two-stage solution framework, where the primary problem is decoupled into a short-term content delivery subproblem and a long-term cache placement subproblem. The former subproblem is solved by designing an alternating optimization approach with whale optimization and successive convex approximation methods according to the cache placement state, while cache content in STNs is updated using an iterative algorithm that utilizes historical information. Simulation results demonstrate the effectiveness of our proposed algorithms, showcasing their convergence and significantly reducing transmission power consumption and backhaul traffic by up to 52%.
△ Less
Submitted 22 March, 2025;
originally announced March 2025.
-
Satellite-Terrestrial Integrated Fog Networks: Architecture, Technologies, and Challenges
Authors:
Shuo Yuan,
Mugen Peng,
Yaohua Sun
Abstract:
In the evolution of sixth-generation (6G) mobile communication networks, satellite-terrestrial integrated networks emerge as a promising paradigm, characterized by their wide coverage and reliable transmission capabilities. By integrating with cloud-based terrestrial mobile communication networks, the limitations of low Earth orbit (LEO) satellites, such as insufficient onboard computing capabilit…
▽ More
In the evolution of sixth-generation (6G) mobile communication networks, satellite-terrestrial integrated networks emerge as a promising paradigm, characterized by their wide coverage and reliable transmission capabilities. By integrating with cloud-based terrestrial mobile communication networks, the limitations of low Earth orbit (LEO) satellites, such as insufficient onboard computing capabilities and limited inter-satellite link capacity, can be addressed. In addition, to efficiently respond to the diverse integrated tasks of communication, remote sensing, and navigation, LEO constellations need to be capable of autonomous networking. To this end, this article presents a satellite-terrestrial integrated fog network for 6G. Its system architecture and key technologies are introduced to achieve flexible collaboration between fog satellites and terrestrial cloud computing centers. In particular, key techniques with diverse challenges and their corresponding solutions are discussed, including integrated waveform design and resource management based on fog satellite onboard processing, as well as mobility management and native artificial intelligence based on cloud-fog collaboration. Finally, future challenges and open issues are outlined.
△ Less
Submitted 22 March, 2025;
originally announced March 2025.
-
Strain tuning of charge density wave and Mott-insulating states in monolayer VTe2
Authors:
Wenqian Tu,
Run Lv,
Dingfu Shao,
Yuping Sun,
Wenjian Lu
Abstract:
Monolayer vanadium ditelluride (VTe2) exhibits a 2\sqrt{3}*2\sqrt{3} charge density wave (CDW) order intertwined with a Mott-insulating state. However, the physical mechanisms driving the emergence of CDW order and Mott-insulating state are still not well understood. In this study, we systematically investigate the electronic band structure, phonon dispersion, and electron-phonon coupling (EPC) of…
▽ More
Monolayer vanadium ditelluride (VTe2) exhibits a 2\sqrt{3}*2\sqrt{3} charge density wave (CDW) order intertwined with a Mott-insulating state. However, the physical mechanisms driving the emergence of CDW order and Mott-insulating state are still not well understood. In this study, we systematically investigate the electronic band structure, phonon dispersion, and electron-phonon coupling (EPC) of monolayer VTe2 under applied biaxial strain. Our results reveal that the CDW phase is metastable in free-standing monolayer VTe2 and becomes stabilized under compressive strain below ε = -2%. The formation of CDW order originates dominantly from strong EPC effect, rather than Fermi surface nesting. The narrowing of the bandwidth due to the CDW order, combined with the correlation effect of the V-3d orbital, collectively drives the system into a Mott-insulating state. Furthermore, we find that tensile strain suppresses CDW order and induces a superconducting state above a critical strain threshold (ε = 2%). These findings enhance our understanding of correlation physics in monolayer VTe2 and provide a pathway for strain-engineered manipulation of quantum phases in two-dimensional transition metal dichalcogenides.
△ Less
Submitted 6 April, 2025; v1 submitted 22 March, 2025;
originally announced March 2025.
-
Hierarchy-Aware and Channel-Adaptive Semantic Communication for Bandwidth-Limited Data Fusion
Authors:
Lei Guo,
Wei Chen,
Yuxuan Sun,
Bo Ai,
Nikolaos Pappas,
Tony Quek
Abstract:
Obtaining high-resolution hyperspectral images (HR-HSI) is costly and data-intensive, making it necessary to fuse low-resolution hyperspectral images (LR-HSI) with high-resolution RGB images (HR-RGB) for practical applications. However, traditional fusion techniques, which integrate detailed information into the reconstruction, significantly increase bandwidth consumption compared to directly tran…
▽ More
Obtaining high-resolution hyperspectral images (HR-HSI) is costly and data-intensive, making it necessary to fuse low-resolution hyperspectral images (LR-HSI) with high-resolution RGB images (HR-RGB) for practical applications. However, traditional fusion techniques, which integrate detailed information into the reconstruction, significantly increase bandwidth consumption compared to directly transmitting raw data. To overcome these challenges, we propose a hierarchy-aware and channel-adaptive semantic communication approach for bandwidth-limited data fusion. A hierarchical correlation module is proposed to preserve both the overall structural information and the details of the image required for super-resolution. This module efficiently combines deep semantic and shallow features from LR-HSI and HR-RGB. To further reduce bandwidth usage while preserving reconstruction quality, a channel-adaptive attention mechanism based on Transformer is proposed to dynamically integrate and transmit the deep and shallow features, enabling efficient data transmission and high-quality HR-HSI reconstruction. Experimental results on the CAVE and Washington DC Mall datasets demonstrate that our method outperforms single-source transmission, achieving up to a 2 dB improvement in peak signal-to-noise ratio (PSNR). Additionally, it reduces bandwidth consumption by two-thirds, confirming its effectiveness in bandwidth-constrained environments for HR-HSI reconstruction tasks.
△ Less
Submitted 22 March, 2025;
originally announced March 2025.
-
GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration
Authors:
Yuchen Sun,
Shanhui Zhao,
Tao Yu,
Hao Wen,
Samith Va,
Mengwei Xu,
Yuanchun Li,
Chongyang Zhang
Abstract:
GUI agents hold significant potential to enhance the experience and efficiency of human-device interaction. However, current methods face challenges in generalizing across applications (apps) and tasks, primarily due to two fundamental limitations in existing datasets. First, these datasets overlook developer-induced structural variations among apps, limiting the transferability of knowledge acros…
▽ More
GUI agents hold significant potential to enhance the experience and efficiency of human-device interaction. However, current methods face challenges in generalizing across applications (apps) and tasks, primarily due to two fundamental limitations in existing datasets. First, these datasets overlook developer-induced structural variations among apps, limiting the transferability of knowledge across diverse software environments. Second, many of them focus solely on navigation tasks, which restricts their capacity to represent comprehensive software architectures and complex user interactions. To address these challenges, we introduce GUI-Xplore, a dataset meticulously designed to enhance cross-application and cross-task generalization via an exploration-and-reasoning framework. GUI-Xplore integrates pre-recorded exploration videos providing contextual insights, alongside five hierarchically structured downstream tasks designed to comprehensively evaluate GUI agent capabilities. To fully exploit GUI-Xplore's unique features, we propose Xplore-Agent, a GUI agent framework that combines Action-aware GUI Modeling with Graph-Guided Environment Reasoning. Further experiments indicate that Xplore-Agent achieves a 10% improvement over existing methods in unfamiliar environments, yet there remains significant potential for further enhancement towards truly generalizable GUI agents.
△ Less
Submitted 22 March, 2025;
originally announced March 2025.
-
Audio-Enhanced Vision-Language Modeling with Latent Space Broadening for High Quality Data Expansion
Authors:
Yu Sun,
Yin Li,
Ruixiao Sun,
Chunhui Liu,
Fangming Zhou,
Ze Jin,
Linjie Wang,
Xiang Shen,
Zhuolin Hao,
Hongyu Xiong
Abstract:
Transformer-based multimodal models are widely used in industrial-scale recommendation, search, and advertising systems for content understanding and relevance ranking. Enhancing labeled training data quality and cross-modal fusion significantly improves model performance, influencing key metrics such as quality view rates and ad revenue. High-quality annotations are crucial for advancing content…
▽ More
Transformer-based multimodal models are widely used in industrial-scale recommendation, search, and advertising systems for content understanding and relevance ranking. Enhancing labeled training data quality and cross-modal fusion significantly improves model performance, influencing key metrics such as quality view rates and ad revenue. High-quality annotations are crucial for advancing content modeling, yet traditional statistical-based active learning (AL) methods face limitations: they struggle to detect overconfident misclassifications and are less effective in distinguishing semantically similar items in deep neural networks. Additionally, audio information plays an increasing role, especially in short-video platforms, yet most pre-trained multimodal architectures primarily focus on text and images. While training from scratch across all three modalities is possible, it sacrifices the benefits of leveraging existing pre-trained visual-language (VL) and audio models. To address these challenges, we propose kNN-based Latent Space Broadening (LSB) to enhance AL efficiency and Vision-Language Modeling with Audio Enhancement (VLMAE), a mid-fusion approach integrating audio into VL models. This system deployed in production systems, leading to significant business gains.
△ Less
Submitted 21 March, 2025;
originally announced March 2025.
-
A Review of Urban Resilience Frameworks: Transferring Knowledge to Enhance Pandemic Resilience
Authors:
Yue Sun,
Ryan Weightman,
Timur Dogan,
Samitha Samaranayake
Abstract:
Urbanization is rapidly increasing, with urban populations expected to grow significantly by 2050, particularly in developing regions. This expansion brings challenges related to chronic stresses and acute shocks, such as the COVID-19 pandemic, which has underscored the critical role of urban form in a city's capacity to manage public health crises. Despite the heightened interest in urban resilie…
▽ More
Urbanization is rapidly increasing, with urban populations expected to grow significantly by 2050, particularly in developing regions. This expansion brings challenges related to chronic stresses and acute shocks, such as the COVID-19 pandemic, which has underscored the critical role of urban form in a city's capacity to manage public health crises. Despite the heightened interest in urban resilience, research examining the relationship between urban morphology and pandemic resilience remains limited, often focusing solely on density and its effect on disease transmission. This work aims to address this gap by evaluating existing frameworks that analyze the relationship between urban resilience and urban form. By critically reviewing these frameworks, with a particular emphasis on theoretical and quantitative approaches, this study seeks to transfer the knowledge gained to better understand the relationship between pandemic resilience and urban morphology. The work also links theoretical ideas with quantitative frameworks, offering a cohesive analysis. The anticipated novelty of this study lies in its comprehensive assessment of urban resilience frameworks and the identification of the current gaps in integrating resilience to pandemic thinking into urban planning and design. The goal is not only to enhance the understanding of urban resilience but also to offer practical guidance for developing more adaptive and effective frameworks for assessing resilience to pandemics in urban environments, thereby preparing cities to better withstand and recover from future crises.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
Stringent test of $CP$ symmetry in $Σ^+$ hyperon decays
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (680 additional authors not shown)
Abstract:
The non-leptonic two-body weak decays $Σ^{+} \to p π^{0}$ and $\barΣ^{-} \to \bar{p} π^{0}$ are investigated, utilizing $(1.0087\pm0.0044)\times10^{10}$ $J/ψ$ events and $(2.7124\pm0.0143)\times10^{9}$ $ψ(3686)$ events collected by BESIII experiment. The precision of the weak-decay parameters for the decays $Σ^{+} \to p π^{0}$ ($α_{0}$) and $\barΣ^{-} \to \bar{p} π^{0}$ ($\barα_{0}$) is improved b…
▽ More
The non-leptonic two-body weak decays $Σ^{+} \to p π^{0}$ and $\barΣ^{-} \to \bar{p} π^{0}$ are investigated, utilizing $(1.0087\pm0.0044)\times10^{10}$ $J/ψ$ events and $(2.7124\pm0.0143)\times10^{9}$ $ψ(3686)$ events collected by BESIII experiment. The precision of the weak-decay parameters for the decays $Σ^{+} \to p π^{0}$ ($α_{0}$) and $\barΣ^{-} \to \bar{p} π^{0}$ ($\barα_{0}$) is improved by a factor of three compared to the previous world average. Furthermore, the quantum-entangled $Σ^{+}\barΣ^{-}$ system enables the most precise test of $CP$ symmetry for the decay $Σ^+\to pπ^0$, through the asymmetry observable $A_{CP}=(α_{0}+\barα_{0})/(α_{0}-\barα_{0})$ that is measured to be $-0.0118\pm0.0083_{\rm stat}\pm0.0028_{\rm syst}$. Assuming $CP$ conservation, the average decay parameter is determined to be ${\left< α_{\rm 0}\right>} = (α_0-\barα_0)/2=-0.9869\pm0.0011_{\rm stat}\pm0.0016_{\rm syst}$, which is the most precise measurement of the asymmetry decay parameters in baryon sectors. The angular dependence of the ratio of the polarization of the $Σ^+$ in both $J/ψ$ and $ψ(3686)$ decays is studied for the first time.
△ Less
Submitted 21 March, 2025;
originally announced March 2025.
-
Modifying Large Language Model Post-Training for Diverse Creative Writing
Authors:
John Joon Young Chung,
Vishakh Padmakumar,
Melissa Roemmele,
Yuqian Sun,
Max Kreminski
Abstract:
As creative writing tasks do not have singular correct answers, large language models (LLMs) trained to perform these tasks should be able to generate diverse valid outputs. However, LLM post-training often focuses on improving generation quality but neglects to facilitate output diversity. Hence, in creative writing generation, we investigate post-training approaches to promote both output divers…
▽ More
As creative writing tasks do not have singular correct answers, large language models (LLMs) trained to perform these tasks should be able to generate diverse valid outputs. However, LLM post-training often focuses on improving generation quality but neglects to facilitate output diversity. Hence, in creative writing generation, we investigate post-training approaches to promote both output diversity and quality. Our core idea is to include deviation -- the degree of difference between a training sample and all other samples with the same prompt -- in the training objective to facilitate learning from rare high-quality instances. By adopting our approach to direct preference optimization (DPO) and odds ratio preference optimization (ORPO), we demonstrate that we can promote the output diversity of trained models while minimally decreasing quality. Our best model with 8B parameters could achieve on-par diversity as a human-created dataset while having output quality similar to the best instruction-tuned models we examined, GPT-4o and DeepSeek-R1. We further validate our approaches with a human evaluation, an ablation, and a comparison to an existing diversification approach, DivPO.
△ Less
Submitted 21 March, 2025;
originally announced March 2025.
-
Autonomous Exploration-Based Precise Mapping for Mobile Robots through Stepwise and Consistent Motions
Authors:
Muhua Zhang,
Lei Ma,
Ying Wu,
Kai Shen,
Yongkui Sun,
Henry Leung
Abstract:
This paper presents an autonomous exploration framework. It is designed for indoor ground mobile robots that utilize laser Simultaneous Localization and Mapping (SLAM), ensuring process completeness and precise mapping results. For frontier search, the local-global sampling architecture based on multiple Rapidly Exploring Random Trees (RRTs) is employed. Traversability checks during RRT expansion…
▽ More
This paper presents an autonomous exploration framework. It is designed for indoor ground mobile robots that utilize laser Simultaneous Localization and Mapping (SLAM), ensuring process completeness and precise mapping results. For frontier search, the local-global sampling architecture based on multiple Rapidly Exploring Random Trees (RRTs) is employed. Traversability checks during RRT expansion and global RRT pruning upon map updates eliminate unreachable frontiers, reducing potential collisions and deadlocks. Adaptive sampling density adjustments, informed by obstacle distribution, enhance exploration coverage potential. For frontier point navigation, a stepwise consistent motion strategy is adopted, wherein the robot strictly drives straight on approximately equidistant line segments in the polyline path and rotates in place at segment junctions. This simplified, decoupled motion pattern improves scan-matching stability and mitigates map drift. For process control, the framework serializes frontier point selection and navigation, avoiding oscillation caused by frequent goal changes in conventional parallelized processes. The waypoint retracing mechanism is introduced to generate repeated observations, triggering loop closure detection and backend optimization in graph-based SLAM, thereby improving map consistency and precision. Experiments in both simulation and real-world scenarios validate the effectiveness of the framework. It achieves improved mapping coverage and precision in more challenging environments compared to baseline 2D exploration algorithms. It also shows robustness in supporting resource-constrained robot platforms and maintaining mapping consistency across various LiDAR field-of-view (FoV) configurations.
△ Less
Submitted 21 March, 2025;
originally announced March 2025.
-
Salient Object Detection in Traffic Scene through the TSOD10K Dataset
Authors:
Yu Qiu,
Yuhang Sun,
Jie Mei,
Lin Xiao,
Jing Xu
Abstract:
Traffic Salient Object Detection (TSOD) aims to segment the objects critical to driving safety by combining semantic (e.g., collision risks) and visual saliency. Unlike SOD in natural scene images (NSI-SOD), which prioritizes visually distinctive regions, TSOD emphasizes the objects that demand immediate driver attention due to their semantic impact, even with low visual contrast. This dual criter…
▽ More
Traffic Salient Object Detection (TSOD) aims to segment the objects critical to driving safety by combining semantic (e.g., collision risks) and visual saliency. Unlike SOD in natural scene images (NSI-SOD), which prioritizes visually distinctive regions, TSOD emphasizes the objects that demand immediate driver attention due to their semantic impact, even with low visual contrast. This dual criterion, i.e., bridging perception and contextual risk, re-defines saliency for autonomous and assisted driving systems. To address the lack of task-specific benchmarks, we collect the first large-scale TSOD dataset with pixel-wise saliency annotations, named TSOD10K. TSOD10K covers the diverse object categories in various real-world traffic scenes under various challenging weather/illumination variations (e.g., fog, snowstorms, low-contrast, and low-light). Methodologically, we propose a Mamba-based TSOD model, termed Tramba. Considering the challenge of distinguishing inconspicuous visual information from complex traffic backgrounds, Tramba introduces a novel Dual-Frequency Visual State Space module equipped with shifted window partitioning and dilated scanning to enhance the perception of fine details and global structure by hierarchically decomposing high/low-frequency components. To emphasize critical regions in traffic scenes, we propose a traffic-oriented Helix 2D-Selective-Scan (Helix-SS2D) mechanism that injects driving attention priors while effectively capturing global multi-direction spatial dependencies. We establish a comprehensive benchmark by evaluating Tramba and 22 existing NSI-SOD models on TSOD10K, demonstrating Tramba's superiority. Our research establishes the first foundation for safety-aware saliency analysis in intelligent transportation systems.
△ Less
Submitted 21 March, 2025;
originally announced March 2025.
-
ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering
Authors:
Kaisi Guan,
Zhengfeng Lai,
Yuchong Sun,
Peng Zhang,
Wei Liu,
Kieran Liu,
Meng Cao,
Ruihua Song
Abstract:
Precisely evaluating semantic alignment between text prompts and generated videos remains a challenge in Text-to-Video (T2V) Generation. Existing text-to-video alignment metrics like CLIPScore only generate coarse-grained scores without fine-grained alignment details, failing to align with human preference. To address this limitation, we propose ETVA, a novel Evaluation method of Text-to-Video Ali…
▽ More
Precisely evaluating semantic alignment between text prompts and generated videos remains a challenge in Text-to-Video (T2V) Generation. Existing text-to-video alignment metrics like CLIPScore only generate coarse-grained scores without fine-grained alignment details, failing to align with human preference. To address this limitation, we propose ETVA, a novel Evaluation method of Text-to-Video Alignment via fine-grained question generation and answering. First, a multi-agent system parses prompts into semantic scene graphs to generate atomic questions. Then we design a knowledge-augmented multi-stage reasoning framework for question answering, where an auxiliary LLM first retrieves relevant common-sense knowledge (e.g., physical laws), and then video LLM answers the generated questions through a multi-stage reasoning mechanism. Extensive experiments demonstrate that ETVA achieves a Spearman's correlation coefficient of 58.47, showing a much higher correlation with human judgment than existing metrics which attain only 31.0. We also construct a comprehensive benchmark specifically designed for text-to-video alignment evaluation, featuring 2k diverse prompts and 12k atomic questions spanning 10 categories. Through a systematic evaluation of 15 existing text-to-video models, we identify their key capabilities and limitations, paving the way for next-generation T2V generation.
△ Less
Submitted 21 March, 2025;
originally announced March 2025.
-
DeFT: Mitigating Data Dependencies for Flexible Communication Scheduling in Distributed Training
Authors:
Lin Meng,
Yuzhong Sun
Abstract:
Communication scheduling aims to reduce communication bottlenecks in data parallel training (DP) by maximizing the overlap between computation and communication. However, existing schemes fall short due to three main issues: (1) hard data dependencies break some overlapping between communication and computation; (2) high coverage rates impair further improvement on performance; (3) imbalanced comm…
▽ More
Communication scheduling aims to reduce communication bottlenecks in data parallel training (DP) by maximizing the overlap between computation and communication. However, existing schemes fall short due to three main issues: (1) hard data dependencies break some overlapping between communication and computation; (2) high coverage rates impair further improvement on performance; (3) imbalanced communication/computation times of tensors caused by partitioning/fusion strategies cause more bubbles. To address these drawbacks, we propose a new communication scheduling scheme DeFT, whose key insight is to mitigate data dependencies and support flexible scheduling in distributed training. DeFT uncovers new overlapping chances in training by transforming the scheduling problem into multiple knapsack problems. Specifically, DeFT eliminates hard dependencies with delayed updates, reducing the coverage rate by adjusting update frequency and utilizing heterogeneous communication links, merging the computation times of backward or forward as the knapsack capacity to avoid the negative impact of unbalanced tensors. Additionally, DeFT preserves training accuracy by adjusting its scheduling strategy via convergence loss quantification. Extensive experiments with 16 A100 GPUs showed that DeFT achieved speedups of 29% to 115% on three representative benchmarks compared to US-Byte and Bytescheduler with no loss of accuracy.
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
Fast online node labeling with graph subsampling
Authors:
Yushen Huang,
Ertai Luo,
Reza Babenezhad,
Yifan Sun
Abstract:
Large data applications rely on storing data in massive, sparse graphs with millions to trillions of nodes. Graph-based methods, such as node prediction, aim for computational efficiency regardless of graph size. Techniques like localized approximate personalized page rank (APPR) solve sparse linear systems with complexity independent of graph size, but is in terms of the maximum node degree, whic…
▽ More
Large data applications rely on storing data in massive, sparse graphs with millions to trillions of nodes. Graph-based methods, such as node prediction, aim for computational efficiency regardless of graph size. Techniques like localized approximate personalized page rank (APPR) solve sparse linear systems with complexity independent of graph size, but is in terms of the maximum node degree, which can be much larger in practice than the average node degree for real-world large graphs. In this paper, we consider an \emph{online subsampled APPR method}, where messages are intentionally dropped at random. We use tools from graph sparsifiers and matrix linear algebra to give approximation bounds on the graph's spectral properties ($O(1/ε^2)$ edges), and node classification performance (added $O(nε)$ overhead).
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
Optimal Nonlinear Online Learning under Sequential Price Competition via s-Concavity
Authors:
Daniele Bracale,
Moulinath Banerjee,
Cong Shi,
Yuekai Sun
Abstract:
We consider price competition among multiple sellers over a selling horizon of $T$ periods. In each period, sellers simultaneously offer their prices and subsequently observe their respective demand that is unobservable to competitors. The demand function for each seller depends on all sellers' prices through a private, unknown, and nonlinear relationship. To address this challenge, we propose a s…
▽ More
We consider price competition among multiple sellers over a selling horizon of $T$ periods. In each period, sellers simultaneously offer their prices and subsequently observe their respective demand that is unobservable to competitors. The demand function for each seller depends on all sellers' prices through a private, unknown, and nonlinear relationship. To address this challenge, we propose a semi-parametric least-squares estimation of the nonlinear mean function, which does not require sellers to communicate demand information. We show that when all sellers employ our policy, their prices converge at a rate of $O(T^{-1/7})$ to the Nash equilibrium prices that sellers would reach if they were fully informed. Each seller incurs a regret of $O(T^{5/7})$ relative to a dynamic benchmark policy. A theoretical contribution of our work is proving the existence of equilibrium under shape-constrained demand functions via the concept of $s$-concavity and establishing regret bounds of our proposed policy. Technically, we also establish new concentration results for the least squares estimator under shape constraints. Our findings offer significant insights into dynamic competition-aware pricing and contribute to the broader study of non-parametric learning in strategic decision-making.
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
Unified Enhancement of the Generalization and Robustness of Language Models via Bi-Stage Optimization
Authors:
Yudao Sun,
Juan Yin,
Juan Zhao,
Fan Zhang,
Yongheng Liu,
Hongji Chen
Abstract:
Neural network language models (LMs) are confronted with significant challenges in generalization and robustness. Currently, many studies focus on improving either generalization or robustness in isolation, without methods addressing both aspects simultaneously, which presents a significant challenge in developing LMs that are both robust and generalized. In this paper, we propose a bi-stage optim…
▽ More
Neural network language models (LMs) are confronted with significant challenges in generalization and robustness. Currently, many studies focus on improving either generalization or robustness in isolation, without methods addressing both aspects simultaneously, which presents a significant challenge in developing LMs that are both robust and generalized. In this paper, we propose a bi-stage optimization framework to uniformly enhance both the generalization and robustness of LMs, termed UEGR. Specifically, during the forward propagation stage, we enrich the output probability distributions of adversarial samples by adaptive dropout to generate diverse sub models, and incorporate JS divergence and adversarial losses of these output distributions to reinforce output stability. During backward propagation stage, we compute parameter saliency scores and selectively update only the most critical parameters to minimize unnecessary deviations and consolidate the model's resilience. Theoretical analysis shows that our framework includes gradient regularization to limit the model's sensitivity to input perturbations and selective parameter updates to flatten the loss landscape, thus improving both generalization and robustness. The experimental results show that our method significantly improves the generalization and robustness of LMs compared to other existing methods across 13 publicly available language datasets, achieving state-of-the-art (SOTA) performance.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Causal Discovery and Counterfactual Reasoning to Optimize Persuasive Dialogue Policies
Authors:
Donghuo Zeng,
Roberto Legaspi,
Yuewen Sun,
Xinshuai Dong,
Kazushi Ikeda,
Peter Spirtes,
Kun Zhang
Abstract:
Tailoring persuasive conversations to users leads to more effective persuasion. However, existing dialogue systems often struggle to adapt to dynamically evolving user states. This paper presents a novel method that leverages causal discovery and counterfactual reasoning for optimizing system persuasion capability and outcomes. We employ the Greedy Relaxation of the Sparsest Permutation (GRaSP) al…
▽ More
Tailoring persuasive conversations to users leads to more effective persuasion. However, existing dialogue systems often struggle to adapt to dynamically evolving user states. This paper presents a novel method that leverages causal discovery and counterfactual reasoning for optimizing system persuasion capability and outcomes. We employ the Greedy Relaxation of the Sparsest Permutation (GRaSP) algorithm to identify causal relationships between user and system utterance strategies, treating user strategies as states and system strategies as actions. GRaSP identifies user strategies as causal factors influencing system responses, which inform Bidirectional Conditional Generative Adversarial Networks (BiCoGAN) in generating counterfactual utterances for the system. Subsequently, we use the Dueling Double Deep Q-Network (D3QN) model to utilize counterfactual data to determine the best policy for selecting system utterances. Our experiments with the PersuasionForGood dataset show measurable improvements in persuasion outcomes using our approach over baseline methods. The observed increase in cumulative rewards and Q-values highlights the effectiveness of causal discovery in enhancing counterfactual reasoning and optimizing reinforcement learning policies for online dialogue systems.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
The Emperor's New Clothes in Benchmarking? A Rigorous Examination of Mitigation Strategies for LLM Benchmark Data Contamination
Authors:
Yifan Sun,
Han Wang,
Dongbai Li,
Gang Wang,
Huan Zhang
Abstract:
Benchmark Data Contamination (BDC)-the inclusion of benchmark testing samples in the training set-has raised increasing concerns in Large Language Model (LLM) evaluation, leading to falsely inflated performance estimates and undermining evaluation reliability. To address this, researchers have proposed various mitigation strategies to update existing benchmarks, including modifying original questi…
▽ More
Benchmark Data Contamination (BDC)-the inclusion of benchmark testing samples in the training set-has raised increasing concerns in Large Language Model (LLM) evaluation, leading to falsely inflated performance estimates and undermining evaluation reliability. To address this, researchers have proposed various mitigation strategies to update existing benchmarks, including modifying original questions or generating new ones based on them. However, a rigorous examination of the effectiveness of these mitigation strategies remains lacking. In this paper, we design a systematic and controlled pipeline along with two novel metrics-fidelity and contamination resistance-to provide a fine-grained and comprehensive assessment of existing BDC mitigation strategies. Previous assessment methods, such as accuracy drop and accuracy matching, focus solely on aggregate accuracy, often leading to incomplete or misleading conclusions. Our metrics address this limitation by emphasizing question-level evaluation result matching. Extensive experiments with 10 LLMs, 5 benchmarks, 20 BDC mitigation strategies, and 2 contamination scenarios reveal that no existing strategy significantly improves resistance over the vanilla case (i.e., no benchmark update) across all benchmarks, and none effectively balances fidelity and contamination resistance. These findings underscore the urgent need for designing more effective BDC mitigation strategies. Our code repository is available at https://github.com/ASTRAL-Group/BDC_mitigation_assessment.
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
Upper critical fields in high-$ T_{\rm{c}} $ superconductors
Authors:
Wei Wei,
Yuling Xiang,
Qiang Hou,
Yue Sun,
Zhixiang Shi
Abstract:
Since the discovery of high-temperature superconductivity in cuprates, understanding the unconventional pairing mechanism has remained one of the most significant challenges. The upper critical field ($H_{\rm{c2}}$) is an essential parameter for obtaining information on the pair-breaking mechanism, coherence length $ξ$, and pairing symmetry, all of which are crucial for understanding unconventiona…
▽ More
Since the discovery of high-temperature superconductivity in cuprates, understanding the unconventional pairing mechanism has remained one of the most significant challenges. The upper critical field ($H_{\rm{c2}}$) is an essential parameter for obtaining information on the pair-breaking mechanism, coherence length $ξ$, and pairing symmetry, all of which are crucial for understanding unconventional superconducting mechanisms. Here, we provide a brief review of studies on $ H_{\rm{c2}} $ in several representative series of cuprate, iron-based, and nickelate superconductors. By comparing the behavior of $ H_{\rm{c2}} $ as a function of temperature, doping concentration, and anisotropy across these three major classes of superconductors, we hope to contribute to a better understanding of the complex pairing interactions in high-temperature superconductors.
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
OThink-MR1: Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning
Authors:
Zhiyuan Liu,
Yuting Zhang,
Feng Liu,
Changwang Zhang,
Ying Sun,
Jun Wang
Abstract:
Multimodal Large Language Models (MLLMs) have gained significant traction for their ability to process diverse input data types and generate coherent, contextually relevant outputs across various applications. While supervised fine-tuning (SFT) has been the predominant approach to enhance MLLM capabilities in task-specific optimization, it often falls short in fostering crucial generalized reasoni…
▽ More
Multimodal Large Language Models (MLLMs) have gained significant traction for their ability to process diverse input data types and generate coherent, contextually relevant outputs across various applications. While supervised fine-tuning (SFT) has been the predominant approach to enhance MLLM capabilities in task-specific optimization, it often falls short in fostering crucial generalized reasoning abilities. Although reinforcement learning (RL) holds great promise in overcoming these limitations, it encounters two significant challenges: (1) its generalized capacities in multimodal tasks remain largely unexplored, and (2) its training constraints, including the constant Kullback-Leibler divergence or the clamp strategy, often result in suboptimal bottlenecks. To address these challenges, we propose OThink-MR1, an advanced MLLM equipped with profound comprehension and reasoning capabilities across multimodal tasks. Specifically, we introduce Group Relative Policy Optimization with a dynamic Kullback-Leibler strategy (GRPO-D), which markedly enhances reinforcement learning (RL) performance. For Qwen2-VL-2B-Instruct, GRPO-D achieves a relative improvement of more than 5.72% over SFT and more than 13.59% over GRPO in same-task evaluation on two adapted datasets. Furthermore, GRPO-D demonstrates remarkable cross-task generalization capabilities, with an average relative improvement of more than 61.63% over SFT in cross-task evaluation. These results highlight that the MLLM trained with GRPO-D on one multimodal task can be effectively transferred to another task, underscoring the superior generalized reasoning capabilities of our proposed OThink-MR1 model.
△ Less
Submitted 28 March, 2025; v1 submitted 20 March, 2025;
originally announced March 2025.
-
Search for the radiative leptonic decay $D^+\toγe^+ν_e$ with Deep Learning
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (680 additional authors not shown)
Abstract:
Using 20.3$~\rm fb^{-1}$ of $e^+e^-$ annihilation data collected at a center-of-mass energy of 3.773$~\rm GeV$ with the BESIII detector, we report an improved search for the radiative leptonic decay $D^+\toγe^+ν_e$. An upper limit on its partial branching fraction for photon energies $E_γ>10~\rm MeV$ is determined to be $1.2\times10^{-5}$ at 90\% confidence level, which excludes most current theor…
▽ More
Using 20.3$~\rm fb^{-1}$ of $e^+e^-$ annihilation data collected at a center-of-mass energy of 3.773$~\rm GeV$ with the BESIII detector, we report an improved search for the radiative leptonic decay $D^+\toγe^+ν_e$. An upper limit on its partial branching fraction for photon energies $E_γ>10~\rm MeV$ is determined to be $1.2\times10^{-5}$ at 90\% confidence level, which excludes most current theoretical predictions. A sophisticated deep learning approach with thorough validation, based on the Transformer architecture, is implemented to efficiently distinguish the signal from massive backgrounds.
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
JADES and SAPPHIRES: Galaxy Metamorphosis Amidst a Huge, Luminous Emission-line Region
Authors:
Francesco D'Eugenio,
Jakob M. Helton,
Kevin Hainline,
Fengwu Sun,
Roberto Maiolino,
Pablo G. Pérez-González,
Ignas Juodžbalis,
Santiago Arribas,
Andrew J. Bunker,
Stefano Carniani,
Emma Curtis-Lake,
Eiichi Egami,
Daniel J. Eisenstein,
Benjamin D. Johnson,
Brant Robertson,
Sandro Tacchella,
Christopher N. A. Willmer,
Chris Willott,
William M. Baker,
A. Lola Danhaive,
Qiao Duan,
Yoshinobu Fudamoto,
Gareth C. Jones,
Xiaojing Lin,
Weizhe Liu
, et al. (10 additional authors not shown)
Abstract:
We report the discovery of a remarkably large and luminous line-emitting nebula extending on either side of the Balmer-break galaxy JADES-GS-518794 at z=5.89, detected with JADES JWST/NIRCam imaging in [O III]$λλ$4959,5007 and H$α$ and spectroscopically confirmed with NIRCam/WFSS thanks to the pure-parallel SAPPHIRES programme. The end-to-end velocity offset is $Δv=830\pm130$ km s$^{-1}$. Nebulae…
▽ More
We report the discovery of a remarkably large and luminous line-emitting nebula extending on either side of the Balmer-break galaxy JADES-GS-518794 at z=5.89, detected with JADES JWST/NIRCam imaging in [O III]$λλ$4959,5007 and H$α$ and spectroscopically confirmed with NIRCam/WFSS thanks to the pure-parallel SAPPHIRES programme. The end-to-end velocity offset is $Δv=830\pm130$ km s$^{-1}$. Nebulae with such large size and high luminosity (25-pkpc diameter, L[O III] = $1.2\times 10^{10}$ L$_\odot$) are routinely observed around bright quasars, unlike JADES-GS-518794. With a stellar mass of $10^{10.1}$ M$_\odot$, this galaxy is at the knee of the mass function at z=6. Its star-formation rate declined for some time (10-100 Myr prior to observation), followed by a recent (10 Myr) upturn. This system is part of a candidate large-scale galaxy overdensity, with an excess of Balmer-break galaxies compared to the field (3 $σ$). We discuss the possible origin of this nebula as material from a merger or gas expelled by an active galactic nucleus (AGN). The symmetry of the nebula, its bubble-like morphology, kinematics, high luminosity, and the extremely high equivalent width of [OIII] together favour the AGN interpretation. Intriguingly, there may be a physical connection between the presence of such a large, luminous nebula and the possible metamorphosis of the central galaxy towards quenching.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Visual Position Prompt for MLLM based Visual Grounding
Authors:
Wei Tang,
Yanpeng Sun,
Qinying Gu,
Zechao Li
Abstract:
Although Multimodal Large Language Models (MLLMs) excel at various image-related tasks, they encounter challenges in precisely aligning coordinates with spatial information within images, particularly in position-aware tasks such as visual grounding. This limitation arises from two key factors. First, MLLMs lack explicit spatial references, making it difficult to associate textual descriptions wit…
▽ More
Although Multimodal Large Language Models (MLLMs) excel at various image-related tasks, they encounter challenges in precisely aligning coordinates with spatial information within images, particularly in position-aware tasks such as visual grounding. This limitation arises from two key factors. First, MLLMs lack explicit spatial references, making it difficult to associate textual descriptions with precise image locations. Second, their feature extraction processes prioritize global context over fine-grained spatial details, leading to weak localization capability. To address this issue, we introduce VPP-LLaVA, an MLLM equipped with Visual Position Prompt (VPP) to improve its grounding capability. VPP-LLaVA integrates two complementary mechanisms. The global VPP overlays learnable, axis-like embeddings onto the input image to provide structured spatial cues. The local VPP focuses on fine-grained localization by incorporating position-aware queries, which suggests probable object locations. We also introduce a VPP-SFT dataset with 0.6M samples, consolidating high-quality visual grounding data into a compact format for efficient model training. Training on this dataset with VPP enhances the model's performance, achieving state-of-the-art results on standard grounding benchmarks despite using fewer training samples compared to other MLLMs like MiniGPT-v2, which rely on much larger datasets ($\sim$21M samples). The code and VPP-SFT dataset will be available at https://github.com/WayneTomas/VPP-LLaVA upon acceptance.
△ Less
Submitted 24 March, 2025; v1 submitted 19 March, 2025;
originally announced March 2025.
-
Strong correlation between $H$-linear magnetoresistance and strange metal in FeSe superconductor
Authors:
Xinyue Wang,
Yue Sun,
Wei Wei,
Qiang Hou,
Nan Zhou,
Yufeng Zhang,
Zhixiang Shi
Abstract:
In strange metals, a strong and anomalous scattering effect exists and increases linearly with temperature. In FeSe, we observed that the temperature dependence of resistivity exhibits non-Fermi liquid behavior in two regions below and above a critical pressure, $p_\text{c}$$\sim$2 GPa. As pressure increases, a transition from quadratic to nonsaturating magnetoresistance is observed, with a distin…
▽ More
In strange metals, a strong and anomalous scattering effect exists and increases linearly with temperature. In FeSe, we observed that the temperature dependence of resistivity exhibits non-Fermi liquid behavior in two regions below and above a critical pressure, $p_\text{c}$$\sim$2 GPa. As pressure increases, a transition from quadratic to nonsaturating magnetoresistance is observed, with a distinct crossover between these two behaviors indicated by $B^{*}$ in the derivative analysis. After subtracting the quadratic term from the magnetoresistance, the residual magnetoresistance clearly exhibits an $H$-linear behavior. Additionally, two segments of $H$-inear magnetoresistance appear with increasing pressure, each arising from distinct origins. Notably, the two $H$-linear magnetoresistances exist and develop within the strange metal states. These results suggest that $H$-linear magnetoresistance is in strong correlation with the strange metal state, which may affect superconductivity in FeSe under pressure. Our study provides valuable insights into the strange metal state and clues for underlying unconventional superconductivity.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization
Authors:
Feifei Li,
Mi Zhang,
Yiming Sun,
Min Yang
Abstract:
Text-to-image diffusion models have achieved state-of-the-art results in synthesis tasks; however, there is a growing concern about their potential misuse in creating harmful content. To mitigate these risks, post-hoc model intervention techniques, such as concept unlearning and safety guidance, have been developed. However, fine-tuning model weights or adapting the hidden states of the diffusion…
▽ More
Text-to-image diffusion models have achieved state-of-the-art results in synthesis tasks; however, there is a growing concern about their potential misuse in creating harmful content. To mitigate these risks, post-hoc model intervention techniques, such as concept unlearning and safety guidance, have been developed. However, fine-tuning model weights or adapting the hidden states of the diffusion model operates in an uninterpretable way, making it unclear which part of the intermediate variables is responsible for unsafe generation. These interventions severely affect the sampling trajectory when erasing harmful concepts from complex, multi-concept prompts, thus hindering their practical use in real-world settings. In this work, we propose the safe generation framework Detect-and-Guide (DAG), leveraging the internal knowledge of diffusion models to perform self-diagnosis and fine-grained self-regulation during the sampling process. DAG first detects harmful concepts from noisy latents using refined cross-attention maps of optimized tokens, then applies safety guidance with adaptive strength and editing regions to negate unsafe generation. The optimization only requires a small annotated dataset and can provide precise detection maps with generalizability and concept specificity. Moreover, DAG does not require fine-tuning of diffusion models, and therefore introduces no loss to their generation diversity. Experiments on erasing sexual content show that DAG achieves state-of-the-art safe generation performance, balancing harmfulness mitigation and text-following performance on multi-concept real-world prompts.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
GenM$^3$: Generative Pretrained Multi-path Motion Model for Text Conditional Human Motion Generation
Authors:
Junyu Shi,
Lijiang Liu,
Yong Sun,
Zhiyuan Zhang,
Jinni Zhou,
Qiang Nie
Abstract:
Scaling up motion datasets is crucial to enhance motion generation capabilities. However, training on large-scale multi-source datasets introduces data heterogeneity challenges due to variations in motion content. To address this, we propose Generative Pretrained Multi-path Motion Model (GenM$^3$), a comprehensive framework designed to learn unified motion representations. GenM$^3$ comprises two c…
▽ More
Scaling up motion datasets is crucial to enhance motion generation capabilities. However, training on large-scale multi-source datasets introduces data heterogeneity challenges due to variations in motion content. To address this, we propose Generative Pretrained Multi-path Motion Model (GenM$^3$), a comprehensive framework designed to learn unified motion representations. GenM$^3$ comprises two components: 1) a Multi-Expert VQ-VAE (MEVQ-VAE) that adapts to different dataset distributions to learn a unified discrete motion representation, and 2) a Multi-path Motion Transformer (MMT) that improves intra-modal representations by using separate modality-specific pathways, each with densely activated experts to accommodate variations within that modality, and improves inter-modal alignment by the text-motion shared pathway. To enable large-scale training, we integrate and unify 11 high-quality motion datasets (approximately 220 hours of motion data) and augment it with textual annotations (nearly 10,000 motion sequences labeled by a large language model and 300+ by human experts). After training on our integrated dataset, GenM$^3$ achieves a state-of-the-art FID of 0.035 on the HumanML3D benchmark, surpassing state-of-the-art methods by a large margin. It also demonstrates strong zero-shot generalization on IDEA400 dataset, highlighting its effectiveness and adaptability across diverse motion scenarios.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Prototype Perturbation for Relaxing Alignment Constraints in Backward-Compatible Learning
Authors:
Zikun Zhou,
Yushuai Sun,
Wenjie Pei,
Xin Li,
Yaowei Wang
Abstract:
The traditional paradigm to update retrieval models requires re-computing the embeddings of the gallery data, a time-consuming and computationally intensive process known as backfilling. To circumvent backfilling, Backward-Compatible Learning (BCL) has been widely explored, which aims to train a new model compatible with the old one. Many previous works focus on effectively aligning the embeddings…
▽ More
The traditional paradigm to update retrieval models requires re-computing the embeddings of the gallery data, a time-consuming and computationally intensive process known as backfilling. To circumvent backfilling, Backward-Compatible Learning (BCL) has been widely explored, which aims to train a new model compatible with the old one. Many previous works focus on effectively aligning the embeddings of the new model with those of the old one to enhance the backward-compatibility. Nevertheless, such strong alignment constraints would compromise the discriminative ability of the new model, particularly when different classes are closely clustered and hard to distinguish in the old feature space. To address this issue, we propose to relax the constraints by introducing perturbations to the old feature prototypes. This allows us to align the new feature space with a pseudo-old feature space defined by these perturbed prototypes, thereby preserving the discriminative ability of the new model in backward-compatible learning. We have developed two approaches for calculating the perturbations: Neighbor-Driven Prototype Perturbation (NDPP) and Optimization-Driven Prototype Perturbation (ODPP). Particularly, they take into account the feature distributions of not only the old but also the new models to obtain proper perturbations along with new model updating. Extensive experiments on the landmark and commodity datasets demonstrate that our approaches perform favorably against state-of-the-art BCL algorithms.
△ Less
Submitted 18 March, 2025;
originally announced March 2025.
-
MAST-Pro: Dynamic Mixture-of-Experts for Adaptive Segmentation of Pan-Tumors with Knowledge-Driven Prompts
Authors:
Runqi Meng,
Sifan Song,
Pengfei Jin,
Yujin Oh,
Lin Teng,
Yulin Wang,
Yiqun Sun,
Ling Chen,
Xiang Li,
Quanzheng Li,
Ning Guo,
Dinggang Shen
Abstract:
Accurate tumor segmentation is crucial for cancer diagnosis and treatment. While foundation models have advanced general-purpose segmentation, existing methods still struggle with: (1) limited incorporation of medical priors, (2) imbalance between generic and tumor-specific features, and (3) high computational costs for clinical adaptation. To address these challenges, we propose MAST-Pro (Mixture…
▽ More
Accurate tumor segmentation is crucial for cancer diagnosis and treatment. While foundation models have advanced general-purpose segmentation, existing methods still struggle with: (1) limited incorporation of medical priors, (2) imbalance between generic and tumor-specific features, and (3) high computational costs for clinical adaptation. To address these challenges, we propose MAST-Pro (Mixture-of-experts for Adaptive Segmentation of pan-Tumors with knowledge-driven Prompts), a novel framework that integrates dynamic Mixture-of-Experts (D-MoE) and knowledge-driven prompts for pan-tumor segmentation. Specifically, text and anatomical prompts provide domain-specific priors, guiding tumor representation learning, while D-MoE dynamically selects experts to balance generic and tumor-specific feature learning, improving segmentation accuracy across diverse tumor types. To enhance efficiency, we employ Parameter-Efficient Fine-Tuning (PEFT), optimizing MAST-Pro with significantly reduced computational overhead. Experiments on multi-anatomical tumor datasets demonstrate that MAST-Pro outperforms state-of-the-art approaches, achieving up to a 5.20% improvement in average DSC while reducing trainable parameters by 91.04%, without compromising accuracy.
△ Less
Submitted 18 March, 2025;
originally announced March 2025.
-
Dynamical Classification of Supercooled Liquids: Critical Cooling Rates and Entropic Signatures
Authors:
B Zhang,
M. Zhang,
D. Y. Sun,
X. G. Gong
Abstract:
Using molecular dynamics simulations, we systematically investigate supercooled liquids formed at cooling rates below and above the critical cooling rate (CCR). By analyzing the distribution of short-time averaged potential energies (DoPE) and crystallization behaviors, we identify two distinct dynamical regimes in supercooled liquids: the glass-forming regime (GFR) and the crystal-forming regime…
▽ More
Using molecular dynamics simulations, we systematically investigate supercooled liquids formed at cooling rates below and above the critical cooling rate (CCR). By analyzing the distribution of short-time averaged potential energies (DoPE) and crystallization behaviors, we identify two distinct dynamical regimes in supercooled liquids: the glass-forming regime (GFR) and the crystal-forming regime (CFR). For systems cooled below CCR (CFR), the DoPE exhibits a sharp peak, indicative of reduced configurational entropy. In contrast, liquids cooled above CCR (GFR) display a broad DoPE distribution, reflecting higher configurational entropy. These findings establish a robust classification framework for supercooled liquids. Further analysis reveals a crossover temperature (T_x) in both regimes, consistent with the freezing temperature (T_f). Near T_x, crystallization barrier-temperature relationships exhibit abrupt changes. Below T_x, CFR crystallizes marginally faster than GFR, whereas above T_x, the influence of cooling rates on crystallization rates diminishes. These results further categorize GFR and CFR into high and low-temperature sub-regimes, highlighting the interplay between thermodynamics and kinetics in supercooled liquids.
△ Less
Submitted 6 April, 2025; v1 submitted 18 March, 2025;
originally announced March 2025.
-
A metropolitan-scale trapped-ion quantum network node with hybrid multiplexing enhancements
Authors:
Z. -B. Cui,
Z. -Q. Wang,
P. -C. Lai,
Y. Wang,
J. -X. Shi,
P. -Y. Liu,
Y. -D. Sun,
Z. -C. Tian,
Y. -B. Liang,
B. -X. Qi,
Y. -Y. Huang,
Z. -C. Zhou,
Y. -K. Wu,
Y. Xu,
L. -M. Duan,
Y. -F. Pu
Abstract:
Quantum network and quantum repeater are promising ways to scale up a quantum information system to enable various applications with unprecedented performance. As a current bottleneck of building a long-distance quantum network, the distribution rate of heralded entanglement between remote network nodes is typically much lower than the decoherence rate of each local node, which obstructs the imple…
▽ More
Quantum network and quantum repeater are promising ways to scale up a quantum information system to enable various applications with unprecedented performance. As a current bottleneck of building a long-distance quantum network, the distribution rate of heralded entanglement between remote network nodes is typically much lower than the decoherence rate of each local node, which obstructs the implementation of a metropolitan-scale quantum network with more than two remote nodes. A promising scheme to accelerate the remote entanglement distribution is through multiplexing enhancement based on a multimode quantum network node. In this work, we experimentally realize a functional $5$-ion quantum network node with two different types of qubits inside. We employ a hybrid multiplexing scheme combining the methods of multiple excitation and ion shuttling, in which maximally $44$ time-bin modes are generated and sent through a long fiber to boost the entangling rate. Via this scheme, we can generate heralded ion-photon entanglement with a high fidelity of $96.8\%$/$94.6\%$/$89.8\%$ with a success rate of $263\,\text{s}^{-1}$/$40\,\text{s}^{-1}$/$4.28\,\text{s}^{-1}$, over a fiber of $3\,$m/$1\,$km/$12\,$km, respectively. In addition, the memory qubit can protect the stored quantum information from the destructive ion-photon entangling attempts via dual-type encoding and a memory coherence time of $366\,$ms is achieved. This coherence time has exceeded the expected entanglement generation time $234\,$ms over a $12\,$km fiber, which is realized for the first time in a metropolitan-scale quantum network node.
△ Less
Submitted 18 March, 2025;
originally announced March 2025.
-
Send Pilot or Data? Leveraging Age of Channel State Information for Throughput Maximization
Authors:
Sirin Chakraborty,
Yin Sun
Abstract:
In this paper, we study the optimal timing for pilot and data transmissions to maximize effective throughput, also known as goodput, over a wireless fading channel. The receiver utilizes the received pilot signal and its Age of Information (AoI), termed the Age of Channel State Information (AoCSI), to estimate the channel state. Based on this estimation, the transmitter selects an appropriate modu…
▽ More
In this paper, we study the optimal timing for pilot and data transmissions to maximize effective throughput, also known as goodput, over a wireless fading channel. The receiver utilizes the received pilot signal and its Age of Information (AoI), termed the Age of Channel State Information (AoCSI), to estimate the channel state. Based on this estimation, the transmitter selects an appropriate modulation and coding scheme (MCS) to maximize goodput while ensuring compliance with a predefined block error probability constraint. Furthermore, we design an optimal pilot scheduling policy that determines whether to transmit a pilot or data at each time step, with the objective of maximizing the long-term average goodput. This problem involves a non-monotonic AoI metric optimization challenge, as the goodput function is non-monotonic with respect to AoCSI. The numerical results illustrate the performance gains achieved by the proposed policy under various SNR levels and mobility speeds.
△ Less
Submitted 17 March, 2025;
originally announced March 2025.
-
Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding
Authors:
Weiyu Guo,
Ziyang Chen,
Shaoguang Wang,
Jianxiang He,
Yijie Xu,
Jinhui Ye,
Ying Sun,
Hui Xiong
Abstract:
Understanding long video content is a complex endeavor that often relies on densely sampled frame captions or end-to-end feature selectors, yet these techniques commonly overlook the logical relationships between textual queries and visual elements. In practice, computational constraints necessitate coarse frame subsampling, a challenge analogous to ``finding a needle in a haystack.'' To address t…
▽ More
Understanding long video content is a complex endeavor that often relies on densely sampled frame captions or end-to-end feature selectors, yet these techniques commonly overlook the logical relationships between textual queries and visual elements. In practice, computational constraints necessitate coarse frame subsampling, a challenge analogous to ``finding a needle in a haystack.'' To address this issue, we introduce a semantics-driven search framework that reformulates keyframe selection under the paradigm of Visual Semantic-Logical Search. Specifically, we systematically define four fundamental logical dependencies: 1) spatial co-occurrence, 2) temporal proximity, 3) attribute dependency, and 4) causal order. These relations dynamically update frame sampling distributions through an iterative refinement process, enabling context-aware identification of semantically critical frames tailored to specific query requirements. Our method establishes new SOTA performance on the manually annotated benchmark in key-frame selection metrics. Furthermore, when applied to downstream video question-answering tasks, the proposed approach demonstrates the best performance gains over existing methods on LongVideoBench and Video-MME, validating its effectiveness in bridging the logical gap between textual queries and visual-temporal reasoning. The code will be publicly available.
△ Less
Submitted 17 March, 2025;
originally announced March 2025.
-
Enhancing Job Salary Prediction with Disentangled Composition Effect Modeling: A Neural Prototyping Approach
Authors:
Yang Ji,
Ying Sun,
Hengshu Zhu
Abstract:
In the era of the knowledge economy, understanding how job skills influence salary is crucial for promoting recruitment with competitive salary systems and aligned salary expectations. Despite efforts on salary prediction based on job positions and talent demographics, there still lacks methods to effectively discern the set-structured skills' intricate composition effect on job salary. While rece…
▽ More
In the era of the knowledge economy, understanding how job skills influence salary is crucial for promoting recruitment with competitive salary systems and aligned salary expectations. Despite efforts on salary prediction based on job positions and talent demographics, there still lacks methods to effectively discern the set-structured skills' intricate composition effect on job salary. While recent advances in neural networks have significantly improved accurate set-based quantitative modeling, their lack of explainability hinders obtaining insights into the skills' composition effects. Indeed, model explanation for set data is challenging due to the combinatorial nature, rich semantics, and unique format. To this end, in this paper, we propose a novel intrinsically explainable set-based neural prototyping approach, namely \textbf{LGDESetNet}, for explainable salary prediction that can reveal disentangled skill sets that impact salary from both local and global perspectives. Specifically, we propose a skill graph-enhanced disentangled discrete subset selection layer to identify multi-faceted influential input subsets with varied semantics. Furthermore, we propose a set-oriented prototype learning method to extract globally influential prototypical sets. The resulting output is transparently derived from the semantic interplay between these input subsets and global prototypes. Extensive experiments on four real-world datasets demonstrate that our method achieves superior performance than state-of-the-art baselines in salary prediction while providing explainable insights into salary-influencing patterns.
△ Less
Submitted 8 April, 2025; v1 submitted 17 March, 2025;
originally announced March 2025.
-
Weyl Fermion Manipulation through Magnetic Transitions in the Ferromagnetic Non-Centrosymmetric Weyl semimetal PrAlSi
Authors:
K. P. Wang,
W. J. Shi,
W. Z. Cao,
X. T. Yang,
Z. Y. Lv,
C. Peng,
C. Chen,
D. F. Liu,
H. F. Yang,
L. X. Yang,
M. Lyu,
P. J. Sun,
E. K. Liu,
M. Ye,
Y. L. Chen,
Y. Sun,
Y. P. Qi,
Z. K. Liu
Abstract:
PrAlSi, a non-centrosymmetric ferromagnetic Weyl semimetal candidate with a Curie temperature of 17.8K, offers a unique platform for exploring the interplay of symmetry breaking and topological electronic structures. Up to now, the Weyl fermion distribution as well as their evolution across the ferromagnetic to paramagnetic phase transition in PrAlSi has not been explored. Here, we uncover the pre…
▽ More
PrAlSi, a non-centrosymmetric ferromagnetic Weyl semimetal candidate with a Curie temperature of 17.8K, offers a unique platform for exploring the interplay of symmetry breaking and topological electronic structures. Up to now, the Weyl fermion distribution as well as their evolution across the ferromagnetic to paramagnetic phase transition in PrAlSi has not been explored. Here, we uncover the presence of Weyl fermions in PrAlSi and demonstrate they could be manipulated through the magnetic phase transition. Our ab-initio calculations indicate a shift in the momentum and energy positions of Weyl fermions, alongside an increase in Weyl point numbers due to band splitting. The predicted band splitting and shifting of Weyl fermions are corroborated by our angle-resolved photoemission spectroscopy experiments. Such manipulation of Weyl fermions leads to the appearance of a net chirality charge and a significant modulation in optical conductivity, as proposed by our calculations. Our research presents a novel method for adjusting the properties of Weyl semimetals by controlling Weyl fermions through magnetic phase transitions, positioning PrAlSi as a model system.
△ Less
Submitted 17 March, 2025;
originally announced March 2025.
-
Topological invariant for holographic Weyl-$\mathrm Z_2$ semimetal
Authors:
Xiantong Chen,
Xuanting Ji,
Ya-Wen Sun
Abstract:
The occurrence of a topological phase transition can be demonstrated by a direct observation of a change in the topological invariant. For holographic topological semimetals, a topological Hamiltonian method needs to be employed to calculate the topological invariants due to the strong coupling nature of the system. We calculate the topological invariants for the holographic Weyl semimetal and the…
▽ More
The occurrence of a topological phase transition can be demonstrated by a direct observation of a change in the topological invariant. For holographic topological semimetals, a topological Hamiltonian method needs to be employed to calculate the topological invariants due to the strong coupling nature of the system. We calculate the topological invariants for the holographic Weyl semimetal and the holographic Weyl-$\mathrm Z_2$ semimetal, which correspond to the chiral charge and the spin-Chern number, respectively. This is achieved by probing fermions within the system and deriving the topological Hamiltonian from the zero-frequency Green's function. In both cases, we have identified an effective band structure characterized by an infinite number of Weyl or $\mathrm Z_2$ nodes, a distinctive feature of holographic systems different from weakly coupled systems. The topological invariants of these nodes are computed numerically and found to be nonzero, thereby confirming the topologically nontrivial nature of these nodes.
△ Less
Submitted 20 March, 2025; v1 submitted 17 March, 2025;
originally announced March 2025.
-
DART: Dual-level Autonomous Robotic Topology for Efficient Exploration in Unknown Environments
Authors:
Qiming Wang,
Yulong Gao,
Yang Wang,
Xiongwei Zhao,
Yijiao Sun,
Xiangyan Kong
Abstract:
Conventional algorithms in autonomous exploration face challenges due to their inability to accurately and efficiently identify the spatial distribution of convex regions in the real-time map. These methods often prioritize navigation toward the nearest or information-rich frontiers -- the boundaries between known and unknown areas -- resulting in incomplete convex region exploration and requiring…
▽ More
Conventional algorithms in autonomous exploration face challenges due to their inability to accurately and efficiently identify the spatial distribution of convex regions in the real-time map. These methods often prioritize navigation toward the nearest or information-rich frontiers -- the boundaries between known and unknown areas -- resulting in incomplete convex region exploration and requiring excessive backtracking to revisit these missed areas. To address these limitations, this paper introduces an innovative dual-level topological analysis approach. First, we introduce a Low-level Topological Graph (LTG), generated through uniform sampling of the original map data, which captures essential geometric and connectivity details. Next, the LTG is transformed into a High-level Topological Graph (HTG), representing the spatial layout and exploration completeness of convex regions, prioritizing the exploration of convex regions that are not fully explored and minimizing unnecessary backtracking. Finally, an novel Local Artificial Potential Field (LAPF) method is employed for motion control, replacing conventional path planning and boosting overall efficiency. Experimental results highlight the effectiveness of our approach. Simulation tests reveal that our framework significantly reduces exploration time and travel distance, outperforming existing methods in both speed and efficiency. Ablation studies confirm the critical role of each framework component. Real-world tests demonstrate the robustness of our method in environments with poor mapping quality, surpassing other approaches in adaptability to mapping inaccuracies and inaccessible areas.
△ Less
Submitted 16 March, 2025;
originally announced March 2025.
-
GraphEval: A Lightweight Graph-Based LLM Framework for Idea Evaluation
Authors:
Tao Feng,
Yihang Sun,
Jiaxuan You
Abstract:
The powerful capabilities of Large Language Models (LLMs) have led to their growing use in evaluating human-generated content, particularly in evaluating research ideas within academic settings. Existing solutions primarily rely on prompt-based LLM methods or fine-tuned lightweight language models for idea evaluation. However, these methods are often unstable and struggle to comprehend the complex…
▽ More
The powerful capabilities of Large Language Models (LLMs) have led to their growing use in evaluating human-generated content, particularly in evaluating research ideas within academic settings. Existing solutions primarily rely on prompt-based LLM methods or fine-tuned lightweight language models for idea evaluation. However, these methods are often unstable and struggle to comprehend the complex semantic information embedded in the ideas, impeding their ability to perform high-quality evaluations. To address the above challenges, we propose GraphEval, a lightweight graph-based LLM framework for idea evaluation. Our insight is that a complex idea can be broken down into comprehensible viewpoint nodes using prompts from small LLMs. These viewpoint nodes can then be linked together through edges created from LLM-based relation extraction and/or BERT similarity scores. The created viewpoint-graph can be used to conveniently propagate scores across view-nodes to improve the robustness of the idea evaluations. In particular, we propose two lightweight graph-based methods for idea evaluation: (1) GraphEval-LP: a training-free label propagation algorithm that propagates evaluation scores from known view-nodes to unknown nodes; (2) GraphEval-GNN: a Graph Neural Networks (GNN) that is trained to predict the evaluation scores given the observed graph with minimal computation resources. Moreover, to overcome LLM's limitation in objectively assessing the novelty of ideas, we further propose a novelty detection model to GraphEval-GNN to enhance its capability in judging idea novelty. Experiments on two datasets show GraphEval improves F1 scores by at least 14% with low computation and API costs. Additionally, GraphEval can effectively detect plagiarized ideas.
△ Less
Submitted 16 March, 2025;
originally announced March 2025.
-
qReduMIS: A Quantum-Informed Reduction Algorithm for the Maximum Independent Set Problem
Authors:
Martin J. A. Schuetz,
Romina Yalovetzky,
Ruben S. Andrist,
Grant Salton,
Yue Sun,
Rudy Raymond,
Shouvanik Chakrabarti,
Atithi Acharya,
Ruslan Shaydulin,
Marco Pistoia,
Helmut G. Katzgraber
Abstract:
We propose and implement a quantum-informed reduction algorithm for the maximum independent set problem that integrates classical kernelization techniques with information extracted from quantum devices. Our larger framework consists of dedicated application, algorithm, and hardware layers, and easily generalizes to the maximum weight independent set problem. In this hybrid quantum-classical frame…
▽ More
We propose and implement a quantum-informed reduction algorithm for the maximum independent set problem that integrates classical kernelization techniques with information extracted from quantum devices. Our larger framework consists of dedicated application, algorithm, and hardware layers, and easily generalizes to the maximum weight independent set problem. In this hybrid quantum-classical framework, which we call qReduMIS, the quantum computer is used as a co-processor to inform classical reduction logic about frozen vertices that are likely (or unlikely) to be in large independent sets, thereby opening up the reduction space after removal of targeted subgraphs. We systematically assess the performance of qReduMIS based on experiments with up to 231 qubits run on Rydberg quantum hardware available through Amazon Braket. Our experiments show that qReduMIS can help address fundamental performance limitations faced by a broad set of (quantum) solvers including Rydberg quantum devices. We outline implementations of qReduMIS with alternative platforms, such as superconducting qubits or trapped ions, and we discuss potential future extensions.
△ Less
Submitted 16 March, 2025;
originally announced March 2025.
-
LLMSeR: Enhancing Sequential Recommendation via LLM-based Data Augmentation
Authors:
Yuqi Sun,
Qidong Liu,
Haiping Zhu,
Feng Tian
Abstract:
Sequential Recommender Systems (SRS) have become a cornerstone of online platforms, leveraging users' historical interaction data to forecast their next potential engagement. Despite their widespread adoption, SRS often grapple with the long-tail user dilemma, resulting in less effective recommendations for individuals with limited interaction records. The advent of Large Language Models (LLMs), w…
▽ More
Sequential Recommender Systems (SRS) have become a cornerstone of online platforms, leveraging users' historical interaction data to forecast their next potential engagement. Despite their widespread adoption, SRS often grapple with the long-tail user dilemma, resulting in less effective recommendations for individuals with limited interaction records. The advent of Large Language Models (LLMs), with their profound capability to discern semantic relationships among items, has opened new avenues for enhancing SRS through data augmentation. Nonetheless, current methodologies encounter obstacles, including the absence of collaborative signals and the prevalence of hallucination phenomena. In this work, we present LLMSeR, an innovative framework that utilizes Large Language Models (LLMs) to generate pseudo-prior items, thereby improving the efficacy of Sequential Recommender Systems (SRS). To alleviate the challenge of insufficient collaborative signals, we introduce the Semantic Interaction Augmentor (SIA), a method that integrates both semantic and collaborative information to comprehensively augment user interaction data. Moreover, to weaken the adverse effects of hallucination in SRS, we develop the Adaptive Reliability Validation (ARV), a validation technique designed to assess the reliability of the generated pseudo items. Complementing these advancements, we also devise a Dual-Channel Training strategy, ensuring seamless integration of data augmentation into the SRS training process.Extensive experiments conducted with three widely-used SRS models demonstrate the generalizability and efficacy of LLMSeR.
△ Less
Submitted 21 March, 2025; v1 submitted 16 March, 2025;
originally announced March 2025.
-
Calibration of Complementary Metal-oxide-semiconductor Sensor-based Photometry to a Few-millimagnitude Precision: The Case of the Mini-SiTian Array
Authors:
Kai Xiao,
Yang Huang,
Haibo Yuan,
Zhirui Li,
Yongkang Sun,
Timothy C. Beers,
Min He,
Jifeng Liu,
Hong Wu,
Yongna Mao,
Bowen Huang,
Mingyang Ma,
Chuanjie Zheng,
Hongrui Gu,
Beichuan Wang,
Lin Yang,
Shuai Xu
Abstract:
We present a pioneering achievement in the high-precision photometric calibration of CMOS-based photometry, by application of the Gaia BP/RP (XP) spectra-based synthetic photometry (XPSP) method to the mini-SiTian array (MST) photometry. Through 79 repeated observations of the $\texttt{f02}$ field on the night, we find good internal consistency in the calibrated MST $G_{\rm MST}$-band magnitudes f…
▽ More
We present a pioneering achievement in the high-precision photometric calibration of CMOS-based photometry, by application of the Gaia BP/RP (XP) spectra-based synthetic photometry (XPSP) method to the mini-SiTian array (MST) photometry. Through 79 repeated observations of the $\texttt{f02}$ field on the night, we find good internal consistency in the calibrated MST $G_{\rm MST}$-band magnitudes for relatively bright stars, with a precision of about 4\,mmag for $G_{\rm MST}\sim 13$. Results from more than 30 different nights (over 3100 observations) further confirm this internal consistency, indicating that the 4\,mmag precision is stable and achievable over timescales of months. An independent external validation using spectroscopic data from the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) DR10 and high-precision photometric data using CCDs from Gaia DR3 reveals a zero-point consistency better than 1\,mmag. Our results clearly demonstrate that CMOS photometry is on par with CCD photometry for high-precision results, highlighting the significant capabilities of CMOS cameras in astronomical observations, especially for large-scale telescope survey arrays.
△ Less
Submitted 16 March, 2025;
originally announced March 2025.
-
Gas Transfer Between the Inner 3-kpc Disk and the Galactic Central Molecular Zone
Authors:
Yang Su,
Shiyu Zhang,
Yan Sun,
Ji Yang,
Fujun Du,
Min Fang,
Qing-Zeng Yan,
Shaobo Zhang,
Zhiwei Chen,
Xuepeng Chen,
Xin Zhou,
Lixia Yuan,
Yuehui Ma
Abstract:
We uncovered a more tilted molecular gas structure with highly negative velocities located near the dust lane. Our observations also show that the approaching gas flows from the overshoot process are captured by the bar gravitational and then flows towards the Galactic central molecular zone (CMZ) through the bar channel. The recycling gas from the overshoot effect, in conjunction with freshly acc…
▽ More
We uncovered a more tilted molecular gas structure with highly negative velocities located near the dust lane. Our observations also show that the approaching gas flows from the overshoot process are captured by the bar gravitational and then flows towards the Galactic central molecular zone (CMZ) through the bar channel. The recycling gas from the overshoot effect, in conjunction with freshly accreted gas from the inner 3-kpc disk, accumulates significantly near R_GC~1/2R_bar and R_GC~2/3R_bar regions by adopting a bar length of ~3.2--3.4kpc. Importantly, within these regions, there are frequent collisions and substantial angular momentum exchanges between gas flows with different trajectories. In this scenario, the DISSIPATION processes arising from interactions between colliding flows, together with the varying torques induced by the nonaxisymmetric bar, effectively transfer the angular momentum of viscous gas outward, thereby driving the molecular gas to settle into the CMZ within ~3 orbital periods. A long-term gas inflow with an average rate of >1.1Msun/yr, coupled with intense transient accretion events that exceed the average rate by several times due to the overshoot effect, significantly regulates the gas distribution, physical properties, and dynamical evolution of the CMZ. These findings provide robust observational evidence for elucidating the intricate dynamics of molecular gas flows towards the CMZ. Our results show that gas dynamics has a significant impact on the secular evolution of both the Milky Way and the extragalactic gas-rich galaxies.
△ Less
Submitted 4 May, 2025; v1 submitted 15 March, 2025;
originally announced March 2025.
-
Rethinking Multi-modal Object Detection from the Perspective of Mono-Modality Feature Learning
Authors:
Tianyi Zhao,
Boyang Liu,
Yanglei Gao,
Yiming Sun,
Maoxun Yuan,
Xingxing Wei
Abstract:
Multi-Modal Object Detection (MMOD), due to its stronger adaptability to various complex environments, has been widely applied in various applications. Extensive research is dedicated to the RGB-IR object detection, primarily focusing on how to integrate complementary features from RGB-IR modalities. However, they neglect the mono-modality insufficient learning problem that the decreased feature e…
▽ More
Multi-Modal Object Detection (MMOD), due to its stronger adaptability to various complex environments, has been widely applied in various applications. Extensive research is dedicated to the RGB-IR object detection, primarily focusing on how to integrate complementary features from RGB-IR modalities. However, they neglect the mono-modality insufficient learning problem that the decreased feature extraction capability in multi-modal joint learning. This leads to an unreasonable but prevalent phenomenon--Fusion Degradation, which hinders the performance improvement of the MMOD model. Motivated by this, in this paper, we introduce linear probing evaluation to the multi-modal detectors and rethink the multi-modal object detection task from the mono-modality learning perspective. Therefore, we construct an novel framework called M$^2$D-LIF, which consists of the Mono-Modality Distillation (M$^2$D) method and the Local Illumination-aware Fusion (LIF) module. The M$^2$D-LIF framework facilitates the sufficient learning of mono-modality during multi-modal joint training and explores a lightweight yet effective feature fusion manner to achieve superior object detection performance. Extensive experiments conducted on three MMOD datasets demonstrate that our M$^2$D-LIF effectively mitigates the Fusion Degradation phenomenon and outperforms the previous SOTA detectors.
△ Less
Submitted 14 March, 2025;
originally announced March 2025.
-
Study of $φ\to K\bar{K}$ and $K_{S}^{0}-K_{L}^{0}$ asymmetry in the amplitude analysis of $D_{s}^{+} \to K_{S}^{0}K_{L}^{0}π^{+}$ decay
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (701 additional authors not shown)
Abstract:
Using $e^+e^-$ annihilation data corresponding to a total integrated luminosity of 7.33 $\rm fb^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we provide the first amplitude analysis and absolute branching fraction measurement of the hadronic decay $D_{s}^{+} \to K_{S}^{0}K_{L}^{0}π^{+}$. The branching fraction of…
▽ More
Using $e^+e^-$ annihilation data corresponding to a total integrated luminosity of 7.33 $\rm fb^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we provide the first amplitude analysis and absolute branching fraction measurement of the hadronic decay $D_{s}^{+} \to K_{S}^{0}K_{L}^{0}π^{+}$. The branching fraction of $D_{s}^{+} \to K_{S}^{0}K_{L}^{0}π^{+}$ is determined to be $(1.86\pm0.06_{\rm stat}\pm0.03_{\rm syst})\%$.
Combining the $\mathcal{B}(D_{s}^{+} \to φ(\to K_{S}^0K_{L}^0) π^+)$ obtained in this work and the world average of $\mathcal{B}(D_{s}^{+} \to φ(\to K^+K^-) π^+)$, we measure the relative branching fraction $\mathcal{B}(φ\to K_S^0K_L^0)/\mathcal{B}(φ\to K^+K^-)$=($0.597 \pm 0.023_{\rm stat} \pm 0.018_{\rm syst} \pm 0.016_{\rm PDG}$), which deviates from the PDG value by more than 3$σ$. Furthermore, the asymmetry of the branching fractions of $D^+_s\to K_{S}^0K^{*}(892)^{+}$ and $D^+_s\to K_{L}^0K^{*}(892)^{+}$, $\frac{\mathcal{B}(D_{s}^{+} \to K_{S}^0K^{*}(892)^{+})-\mathcal{B}(D_{s}^{+} \to K_{L}^0K^{*}(892)^{+})}{\mathcal{B}(D_{s}^{+} \to K_{S}^0K^{*}(892)^{+})+\mathcal{B}(D_{s}^{+} \to K_{L}^0K^{*}(892)^{+})}$, is determined to be $(-13.4\pm5.0_{\rm stat}\pm3.4_{\rm syst})\%$.
△ Less
Submitted 23 March, 2025; v1 submitted 14 March, 2025;
originally announced March 2025.
-
Difference-in-Differences Meets Synthetic Control: Doubly Robust Identification and Estimation
Authors:
Yixiao Sun,
Haitian Xie,
Yuhang Zhang
Abstract:
Difference-in-Differences (DiD) and Synthetic Control (SC) are widely used methods for causal inference in panel data, each with its own strengths and limitations. In this paper, we propose a novel methodology that integrates the advantages of both DiD and SC approaches. Our integrated approach provides a doubly robust identification strategy for causal effects in panel data with a group structure…
▽ More
Difference-in-Differences (DiD) and Synthetic Control (SC) are widely used methods for causal inference in panel data, each with its own strengths and limitations. In this paper, we propose a novel methodology that integrates the advantages of both DiD and SC approaches. Our integrated approach provides a doubly robust identification strategy for causal effects in panel data with a group structure, identifying the average treatment effect on the treated (ATT) under either the parallel trends assumption or the group-level SC assumption. Building on this identification result, we develop a unified semiparametric framework for estimating the ATT. Notably, while the identification-robust moment function satisfies Neyman orthogonality under the parallel trends assumption, it does not under the SC assumption, leading to different asymptotic variances under these two identification strategies. To address this challenge, we propose a multiplier bootstrap method that consistently approximates the asymptotic distribution, regardless of which identification assumption holds. Furthermore, we extend our methodology to accommodate repeated cross-sectional data and staggered treatment designs. As an empirical application, we apply our method to evaluate the impact of the 2003 minimum wage increase in Alaska on family income.
△ Less
Submitted 14 March, 2025;
originally announced March 2025.
-
Integrating Dynamical Systems Modeling with Spatiotemporal scRNA-seq Data Analysis
Authors:
Zhenyi Zhang,
Yuhao Sun,
Qiangwei Peng,
Tiejun Li,
Peijie Zhou
Abstract:
Understanding the dynamic nature of biological systems is fundamental to deciphering cellular behavior, developmental processes, and disease progression. Single-cell RNA sequencing (scRNA-seq) has provided static snapshots of gene expression, offering valuable insights into cellular states at a single time point. Recent advancements in temporally resolved scRNA-seq, spatial transcriptomics (ST), a…
▽ More
Understanding the dynamic nature of biological systems is fundamental to deciphering cellular behavior, developmental processes, and disease progression. Single-cell RNA sequencing (scRNA-seq) has provided static snapshots of gene expression, offering valuable insights into cellular states at a single time point. Recent advancements in temporally resolved scRNA-seq, spatial transcriptomics (ST), and time-series spatial transcriptomics (temporal-ST) have further revolutionized our ability to study the spatiotemporal dynamics of individual cells. These technologies, when combined with computational frameworks such as Markov chains, stochastic differential equations (SDEs), and generative models like optimal transport and Schrödinger bridges, enable the reconstruction of dynamic cellular trajectories and cell fate decisions. This review discusses how these dynamical system approaches offer new opportunities to model and infer cellular dynamics from a systematic perspective.
△ Less
Submitted 30 April, 2025; v1 submitted 14 March, 2025;
originally announced March 2025.