-
Response Attack: Exploiting Contextual Priming to Jailbreak Large Language Models
Authors:
Ziqi Miao,
Lijun Li,
Yuan Xiong,
Zhenhua Liu,
Pengyu Zhu,
Jing Shao
Abstract:
Contextual priming, where earlier stimuli covertly bias later judgments, offers an unexplored attack surface for large language models (LLMs). We uncover a contextual priming vulnerability in which the previous response in the dialogue can steer its subsequent behavior toward policy-violating content. Building on this insight, we propose Response Attack, which uses an auxiliary LLM to generate a m…
▽ More
Contextual priming, where earlier stimuli covertly bias later judgments, offers an unexplored attack surface for large language models (LLMs). We uncover a contextual priming vulnerability in which the previous response in the dialogue can steer its subsequent behavior toward policy-violating content. Building on this insight, we propose Response Attack, which uses an auxiliary LLM to generate a mildly harmful response to a paraphrased version of the original malicious query. They are then formatted into the dialogue and followed by a succinct trigger prompt, thereby priming the target model to generate harmful content. Across eight open-source and proprietary LLMs, RA consistently outperforms seven state-of-the-art jailbreak techniques, achieving higher attack success rates. To mitigate this threat, we construct and release a context-aware safety fine-tuning dataset, which significantly reduces the attack success rate while preserving model capabilities. The code and data are available at https://github.com/Dtc7w3PQ/Response-Attack.
△ Less
Submitted 7 July, 2025;
originally announced July 2025.
-
Cage-Based Deformation for Transferable and Undefendable Point Cloud Attack
Authors:
Keke Tang,
Ziyong Du,
Weilong Peng,
Xiaofei Wang,
Peican Zhu,
Ligang Liu,
Zhihong Tian
Abstract:
Adversarial attacks on point clouds often impose strict geometric constraints to preserve plausibility; however, such constraints inherently limit transferability and undefendability. While deformation offers an alternative, existing unstructured approaches may introduce unnatural distortions, making adversarial point clouds conspicuous and undermining their plausibility. In this paper, we propose…
▽ More
Adversarial attacks on point clouds often impose strict geometric constraints to preserve plausibility; however, such constraints inherently limit transferability and undefendability. While deformation offers an alternative, existing unstructured approaches may introduce unnatural distortions, making adversarial point clouds conspicuous and undermining their plausibility. In this paper, we propose CageAttack, a cage-based deformation framework that produces natural adversarial point clouds. It first constructs a cage around the target object, providing a structured basis for smooth, natural-looking deformation. Perturbations are then applied to the cage vertices, which seamlessly propagate to the point cloud, ensuring that the resulting deformations remain intrinsic to the object and preserve plausibility. Extensive experiments on seven 3D deep neural network classifiers across three datasets show that CageAttack achieves a superior balance among transferability, undefendability, and plausibility, outperforming state-of-the-art methods. Codes will be made public upon acceptance.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.
-
DeepDive: A deep dive into the physics of the first massive quiescent galaxies in the Universe
Authors:
K. Ito,
F. Valentino,
G. Brammer,
M. L. Hamadouche,
K. E. Whitaker,
V. Kokorev,
P. Zhu,
T. Kakimoto,
P. -F. Wu,
J. Antwi-Danso,
W. M. Baker,
D. Ceverino,
A. L. Faisst,
M. Farcy,
S. Fujimoto,
A. Gallazzi,
S. Gillman,
R. Gottumukkala,
K. E. Heintz,
M. Hirschmann,
C. K. Jespersen,
M. Kubo,
M. Lee,
G. Magdis,
M. Onodera
, et al. (4 additional authors not shown)
Abstract:
We present the DeepDive program, in which we obtained deep ($1-3$ hours) JWST/NIRSpec G235M/F170LP spectra for 10 primary massive ($\log{(M_\star/M_\odot)}=10.8-11.5$) quiescent galaxies at $z\sim3-4$. A novel reduction procedure extends the nominal wavelength coverage of G235M beyond H$α$ and [NII] at $z\sim4$, revealing weak, narrow H$α$ lines indicative of low star formation rates (…
▽ More
We present the DeepDive program, in which we obtained deep ($1-3$ hours) JWST/NIRSpec G235M/F170LP spectra for 10 primary massive ($\log{(M_\star/M_\odot)}=10.8-11.5$) quiescent galaxies at $z\sim3-4$. A novel reduction procedure extends the nominal wavelength coverage of G235M beyond H$α$ and [NII] at $z\sim4$, revealing weak, narrow H$α$ lines indicative of low star formation rates (${\rm SFR}\sim0-5\, M_\odot\, {\rm yr^{-1}}$). Two out of 10 primary targets have broad H$α$ lines, indicating the presence of AGNs. We also conduct an archival search of quiescent galaxies observed with NIRSpec gratings in the DAWN JWST Archive, which provides a statistical context for interpreting the DeepDive targets. This archival search provides a spectroscopic sample of 140 quiescent galaxies spanning $1<z<5$ and covering more than an order of magnitude in stellar mass. We revisit the selection of quiescent galaxies based on rest-frame $UVJ$ colors, specific star formation rates, and the detection of the 4000Å spectral break, finding $\sim90\%$ overlap between these criteria. The sample of a total of 150 quiescent galaxies constructed in this study shows that those at $z\sim3-5$, including the DeepDive targets, typically exhibit weaker 4000Å breaks and bluer colors than their lower-redshift counterparts, indicating generally younger stellar populations. Stacked spectra of sources grouped by the $D_n4000$ index reveal faint Iron and Magnesium absorption line features in the stellar continuum even for the low $D_n4000$ ($D_n4000<1.35$) subsample at high redshift ($z\sim3$). In addition, higher $D_n4000$ subsamples show fainter nebular emission lines. These results demonstrate that medium-resolution NIRSpec spectroscopy is essential for robustly characterizing the diversity and evolution of early quiescent galaxies. All data from this study will be made publicly available.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Probing valence electron and hydrogen dynamics using charge-pair imaging with ultrafast electron diffraction
Authors:
Tianyu Wang,
Hui Jiang,
Ming Zhang,
Xiao Zou,
Pengfei Zhu,
Feng He,
Zheng Li,
Dao Xiang
Abstract:
A key challenge in ultrafast science has been to directly track the coupled motions of electrons and nuclei in real-space and real-time. This study presents a significant step towards this goal by demonstrating the feasibility of time-resolved real-space tracking of valence electron and hydrogen dynamics during the photodissociation of ammonia (NH3) using MeV ultrafast electron diffraction. It is…
▽ More
A key challenge in ultrafast science has been to directly track the coupled motions of electrons and nuclei in real-space and real-time. This study presents a significant step towards this goal by demonstrating the feasibility of time-resolved real-space tracking of valence electron and hydrogen dynamics during the photodissociation of ammonia (NH3) using MeV ultrafast electron diffraction. It is demonstrated that the enhanced temporal resolution, in conjunction with the analysis of the charge-pair distribution function, enables the disentanglement of the correlated motion of valence electrons and hydrogens in photoexcited ammonia molecule. The methodology employed in this study, which utilizes the charge-pair distribution function from ultrafast electron scattering to retrieve intertwined electron and nucleus dynamics, may open up new opportunities in the study of quantum dynamics for a wide range of molecules.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Nonlinear edge localized mode with impurity seeding in CFETR hybrid scenario
Authors:
Shiyong Zeng,
Ping Zhu
Abstract:
A critical challenge for operating fusion burning plasma in high confinement mode lies in mitigating damage caused by edge localized modes (ELMs). While impurity seeding has been experimentally validated as a reliable and effective ELM mitigation technique, its underlying physics remains insufficiently understood and requires further clarification. Through nonlinear magnetohydrodynamic (MHD) simul…
▽ More
A critical challenge for operating fusion burning plasma in high confinement mode lies in mitigating damage caused by edge localized modes (ELMs). While impurity seeding has been experimentally validated as a reliable and effective ELM mitigation technique, its underlying physics remains insufficiently understood and requires further clarification. Through nonlinear magnetohydrodynamic (MHD) simulations, this work reproduces key features of natural ELM crash and reveals its trigger mechanism. Impurity seeding significantly affects nonlinear ELM dynamics by inducing local and global modifications to the pedestal pressure profile, driving high-n ballooning mode instabilities that govern ELM crash. Two critical control parameters -- impurity density level and poloidal seeding location -- are systematically investigated, which play key roles in the ELM crash onset timing and the resulting energy loss magnitude.
△ Less
Submitted 25 June, 2025;
originally announced June 2025.
-
Dual Synchronization Effects in Light Scattering by Spherical Particle Systems
Authors:
Guanglang Xu,
Bingqiang Sun,
Ping Zhu,
Huizeng Liu,
Ye Zhou,
Chen Zhou
Abstract:
We report the discovery of a novel and fundamental dual synchronization relationship between the scattering efficiency (Q$_{\text{sca}}$) and a specifically formulated angular distribution complexity parameter ($\widetilde{C}_{\text{p}}$) in spherical particle systems. Through extensive numerical simulations using the rigorous Multiple Sphere T-Matrix (MSTM) method, we found that Q$_{\text{sca}}$…
▽ More
We report the discovery of a novel and fundamental dual synchronization relationship between the scattering efficiency (Q$_{\text{sca}}$) and a specifically formulated angular distribution complexity parameter ($\widetilde{C}_{\text{p}}$) in spherical particle systems. Through extensive numerical simulations using the rigorous Multiple Sphere T-Matrix (MSTM) method, we found that Q$_{\text{sca}}$ exhibits a strong positive correlation with (1-$\widetilde{C}_{\text{p}}$) when the real part of the refractive index is varied, while it synchronizes strongly and positively with $\widetilde{C}_{\text{p}}$ when the imaginary part is varied. Our analysis reveals that this duality arises from the distinct ways the real and imaginary parts of the refractive index \textbf{perturb vs.~dampen electromagnetic resonances} within the particles, leading to different coupled responses in the total scattered energy and the angular distribution. This discovery provides unprecedented insights into how phase contrast and absorption processes distinctly modulate scattering properties and the angular distribution of scattered light, particularly in regimes dominated by resonance. It establishes that the specific formulation of $\widetilde{C}_{\text{p}}$ used here is sensitive to the overall balance of multipole contributions, making it a valuable parameter for capturing refractive index-driven changes. }.
△ Less
Submitted 25 June, 2025;
originally announced June 2025.
-
MHD simulation of tilt instability during the dynamic FRC magnetic compression process
Authors:
Yiming Ma,
Ping Zhu,
Bo Rao,
Haolong Li
Abstract:
The nonlinear evolution of the tilt instability in a field reversed configuration (FRC) during the dynamic magnetic compression process has been investigated using magnetohydrodynamic (MHD) simulations with the NIMROD code [C. R. Sovinec \textit{et al.}, J. Comput. Phys. \textbf{195}, 355 (2004)]. The tilt mode induces significant deformations in the linear growth phase and results in complete con…
▽ More
The nonlinear evolution of the tilt instability in a field reversed configuration (FRC) during the dynamic magnetic compression process has been investigated using magnetohydrodynamic (MHD) simulations with the NIMROD code [C. R. Sovinec \textit{et al.}, J. Comput. Phys. \textbf{195}, 355 (2004)]. The tilt mode induces significant deformations in the linear growth phase and results in complete confinement loss of the FRC in the nonlinear phase, with no evidence of dynamic nonlinear stabilization. The growth rate of the tilt mode increases with the compression field ramping rate and approaches an asymptotic value. Toroidal flow can reduce both the growth rate and the nonlinear saturation amplitude of the tilt mode. The stabilizing effect of the toroidal rotation is enhanced with higher compression field ramping rates due to the spontaneous toroidal field generation and increased flow shear during compression. Although the tilt mode remains unstable with a toroidal rotation Mach number close to 0.5, the onset of tilt distortion can be delayed, allowing a magnetic compression ratio up to 5.3 before the compressional heating terminates.
△ Less
Submitted 25 June, 2025;
originally announced June 2025.
-
UniMind: Unleashing the Power of LLMs for Unified Multi-Task Brain Decoding
Authors:
Weiheng Lu,
Chunfeng Song,
Jiamin Wu,
Pengyu Zhu,
Yuchen Zhou,
Weijian Mai,
Qihao Zheng,
Wanli Ouyang
Abstract:
Decoding human brain activity from electroencephalography (EEG) signals is a central challenge at the intersection of neuroscience and artificial intelligence, enabling diverse applications in mental state assessment, clinical monitoring, and human-machine interaction. Recent efforts have extensively explored EEG-based brain foundation models for generalized brain decoding, employing large-scale t…
▽ More
Decoding human brain activity from electroencephalography (EEG) signals is a central challenge at the intersection of neuroscience and artificial intelligence, enabling diverse applications in mental state assessment, clinical monitoring, and human-machine interaction. Recent efforts have extensively explored EEG-based brain foundation models for generalized brain decoding, employing large-scale training on multiple datasets. However, most of these attempts struggle with generalizability and fail to achieve satisfactory performance without task-specific tuning due to pronounced inherent heterogeneity among decoding tasks. To address these challenges, we present UniMind, a general-purpose EEG foundation model for unified multi-task brain decoding by uniquely unleashing the power of large language models to comprehend complex neural patterns. UniMind offers several advantages. First, we design a Neuro-Language Connector to bridge the modality gap between neural signals and large language models, distilling and transforming the spatiotemporal neural patterns of EEG data into representations understandable by language models. Second, a Task-aware Query Selection module is proposed to inject task-awareness into the cross-modal alignment by dynamically generating task-adaptive query tokens, enabling learning of task-relevant neural patterns across diverse tasks. Extensive experiments across ten datasets demonstrate that UniMind substantially outperforms state-of-the-art multi-task decoding models, with an average gain of 12 percent, while also offering valuable neuroscientific insights into neural functional correlations across tasks. The code will be made publicly available.
△ Less
Submitted 23 June, 2025;
originally announced June 2025.
-
Mapping the Excitation Mechanisms in the LINER I Active Galactic Nucleus NGC 5005: Positive Feedback and a Thin LINER Cocoon
Authors:
Anna Trindade Falcão,
G. Fabbiano,
M. Elvis,
P. Zhu,
S. Kraemer,
W. P. Maksym,
R. Middei,
D. L. Król
Abstract:
We present a spatially resolved Baldwin-Phillips-Terlevich analysis of the narrow-line region (NLR) in the low-ionization nuclear emission-line region (LINER) I galaxy NGC 5005 using Hubble Space Telescope narrowband imaging of [O III]$λ$5007, H$β$, H$α$, and [S II]$λλ$6717,6731. With a resolution of ${\lesssim}$0.1 (${\lesssim}$10 pc at z = 0.003), we dissect the NLR into H II (star-forming), Sey…
▽ More
We present a spatially resolved Baldwin-Phillips-Terlevich analysis of the narrow-line region (NLR) in the low-ionization nuclear emission-line region (LINER) I galaxy NGC 5005 using Hubble Space Telescope narrowband imaging of [O III]$λ$5007, H$β$, H$α$, and [S II]$λλ$6717,6731. With a resolution of ${\lesssim}$0.1 (${\lesssim}$10 pc at z = 0.003), we dissect the NLR into H II (star-forming), Seyfert, and LINERs across spatial scales extending up to r$\sim$8 kpc from the nucleus. Our results reveal a compact nuclear region exhibiting Seyfert-like emission, consistent with photoionization by a low-luminosity active galactic nucleus (AGN). Surrounding this Seyfert-like nucleus is a thin ($\sim$20 pc thick) higher-excitation LINER-like cocoon, likely arising from shock-excited gas in the interstellar medium (ISM). Beyond this cocoon, a centrally localized extended (r$\sim$1 kpc) LINER-like region surrounds the Seyfert-like nucleus and cocoon, likely ionized by the AGN, while a more extended (r${\gtrsim}$2 kpc) LINER-like zone may be ionized by a combination of post-AGB stars and shocks from gas inflows. We also detect H II-like regions at both small and large scales. In the inner 500 pc, these regions may be triggered by jet-ISM interactions, potentially inducing localized star formation. At r$\sim$4 kpc, we identify an outer H II-like region tracing a large-scale star-forming ring, where ionization is dominated by young stars.
△ Less
Submitted 17 June, 2025;
originally announced June 2025.
-
Surprise Calibration for Better In-Context Learning
Authors:
Zhihang Tan,
Jingrui Hou,
Ping Wang,
Qibiao Hu,
Peng Zhu
Abstract:
In-context learning (ICL) has emerged as a powerful paradigm for task adaptation in large language models (LLMs), where models infer underlying task structures from a few demonstrations. However, ICL remains susceptible to biases that arise from prior knowledge and contextual demonstrations, which can degrade the performance of LLMs. Existing bias calibration methods typically apply fixed class pr…
▽ More
In-context learning (ICL) has emerged as a powerful paradigm for task adaptation in large language models (LLMs), where models infer underlying task structures from a few demonstrations. However, ICL remains susceptible to biases that arise from prior knowledge and contextual demonstrations, which can degrade the performance of LLMs. Existing bias calibration methods typically apply fixed class priors across all inputs, limiting their efficacy in dynamic ICL settings where the context for each query differs. To address these limitations, we adopt implicit sequential Bayesian inference as a framework for interpreting ICL, identify "surprise" as an informative signal for class prior shift, and introduce a novel method--Surprise Calibration (SC). SC leverages the notion of surprise to capture the temporal dynamics of class priors, providing a more adaptive and computationally efficient solution for in-context learning. We empirically demonstrate the superiority of SC over existing bias calibration techniques across a range of benchmark natural language processing tasks.
△ Less
Submitted 17 June, 2025; v1 submitted 15 June, 2025;
originally announced June 2025.
-
Serving Large Language Models on Huawei CloudMatrix384
Authors:
Pengfei Zuo,
Huimin Lin,
Junbo Deng,
Nan Zou,
Xingkun Yang,
Yingyu Diao,
Weifeng Gao,
Ke Xu,
Zhangyu Chen,
Shirui Lu,
Zhao Qiu,
Peiyang Li,
Xianyu Chang,
Zhengzhong Yu,
Fangzheng Miao,
Jia Zheng,
Ying Li,
Yuan Feng,
Bei Wang,
Zaijian Zong,
Mosong Zhou,
Wenli Zhou,
Houjiang Chen,
Xingyu Liao,
Yipeng Li
, et al. (21 additional authors not shown)
Abstract:
The rapid evolution of large language models (LLMs), driven by growing parameter scales, adoption of mixture-of-experts (MoE) architectures, and expanding context lengths, imposes unprecedented demands on AI infrastructure. Traditional AI clusters face limitations in compute intensity, memory bandwidth, inter-chip communication, and latency, compounded by variable workloads and strict service-leve…
▽ More
The rapid evolution of large language models (LLMs), driven by growing parameter scales, adoption of mixture-of-experts (MoE) architectures, and expanding context lengths, imposes unprecedented demands on AI infrastructure. Traditional AI clusters face limitations in compute intensity, memory bandwidth, inter-chip communication, and latency, compounded by variable workloads and strict service-level objectives. Addressing these issues requires fundamentally redesigned hardware-software integration. This paper introduces Huawei CloudMatrix, a next-generation AI datacenter architecture, realized in the production-grade CloudMatrix384 supernode. It integrates 384 Ascend 910 NPUs and 192 Kunpeng CPUs interconnected via an ultra-high-bandwidth Unified Bus (UB) network, enabling direct all-to-all communication and dynamic pooling of resources. These features optimize performance for communication-intensive operations, such as large-scale MoE expert parallelism and distributed key-value cache access. To fully leverage CloudMatrix384, we propose CloudMatrix-Infer, an advanced LLM serving solution incorporating three core innovations: a peer-to-peer serving architecture that independently scales prefill, decode, and caching; a large-scale expert parallelism strategy supporting EP320 via efficient UB-based token dispatch; and hardware-aware optimizations including specialized operators, microbatch-based pipelining, and INT8 quantization. Evaluation with the DeepSeek-R1 model shows CloudMatrix-Infer achieves state-of-the-art efficiency: prefill throughput of 6,688 tokens/s per NPU and decode throughput of 1,943 tokens/s per NPU (<50 ms TPOT). It effectively balances throughput and latency, sustaining 538 tokens/s per NPU even under stringent 15 ms latency constraints, while INT8 quantization maintains model accuracy across benchmarks.
△ Less
Submitted 19 June, 2025; v1 submitted 14 June, 2025;
originally announced June 2025.
-
A Theoretical Three-Dimensional Diagram to Separate Star Formation, Active Galactic Nuclei, and Shocks in Galaxies
Authors:
Peixin Zhu,
Lisa J. Kewley,
Ralph S. Sutherland,
Kathryn Grasha
Abstract:
The excitation sources in galaxies are frequently mixed due to AGN and stellar feedback, including star formation, active galactic nuclei (AGNs), and shock excitation. Disentangling the star formation, AGN, and shocks in galaxy integral-field spectra (IFU) at optical wavelengths is crucial to expanding the galaxy sample for AGN and stellar feedback studies, given the lack of multiwavelength observ…
▽ More
The excitation sources in galaxies are frequently mixed due to AGN and stellar feedback, including star formation, active galactic nuclei (AGNs), and shock excitation. Disentangling the star formation, AGN, and shocks in galaxy integral-field spectra (IFU) at optical wavelengths is crucial to expanding the galaxy sample for AGN and stellar feedback studies, given the lack of multiwavelength observations for most of the galaxies that are observed in optical wavelengths. Previous methods to address this issue either have a limited application range or are highly uncertain in separating AGN from shock excitation (D'Agostino et al. 2019; Johnston et al. 2023). Here, we propose a theoretical three-dimensional (3D) diagram. This theoretical 3D diagram overcomes the limitations of previous methods and can simultaneously separate star formation, AGNs, and shocks in active galaxies. Along with the separation, the new theoretical 3D diagram also constrains the gas metallicity, ionization parameter, and gas pressure within the galaxy. By applying the Very Large Telescope (VLT)/MUSE IFU data and the Wide Field Spectrograph IFU data for NGC5728 on the theoretical 3D diagram, we find a star-forming ring surrounding the galaxy center with a projected radius of $\sim$1 kpc in the sky plane, an AGN ionized-bicone extended up to $\sim$2 kpc from the nuclear center, and a fast shock dominated disk region at the base of the AGN outflow, which is likely associated with a nuclear accretion disk or a result of jet-ISM interaction. The theoretical 3D diagram opens a new window to study the interplay among star formation, AGN, and shocks in active galaxies.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Authors:
Yu Gao,
Haoyuan Guo,
Tuyen Hoang,
Weilin Huang,
Lu Jiang,
Fangyuan Kong,
Huixia Li,
Jiashi Li,
Liang Li,
Xiaojie Li,
Xunsong Li,
Yifu Li,
Shanchuan Lin,
Zhijie Lin,
Jiawei Liu,
Shu Liu,
Xiaonan Nie,
Zhiwu Qing,
Yuxi Ren,
Li Sun,
Zhi Tian,
Rui Wang,
Sen Wang,
Guoqiang Wei,
Guohong Wu
, et al. (19 additional authors not shown)
Abstract:
Notable breakthroughs in diffusion modeling have propelled rapid improvements in video generation, yet current foundational model still face critical challenges in simultaneously balancing prompt following, motion plausibility, and visual quality. In this report, we introduce Seedance 1.0, a high-performance and inference-efficient video foundation generation model that integrates several core tec…
▽ More
Notable breakthroughs in diffusion modeling have propelled rapid improvements in video generation, yet current foundational model still face critical challenges in simultaneously balancing prompt following, motion plausibility, and visual quality. In this report, we introduce Seedance 1.0, a high-performance and inference-efficient video foundation generation model that integrates several core technical improvements: (i) multi-source data curation augmented with precision and meaningful video captioning, enabling comprehensive learning across diverse scenarios; (ii) an efficient architecture design with proposed training paradigm, which allows for natively supporting multi-shot generation and jointly learning of both text-to-video and image-to-video tasks. (iii) carefully-optimized post-training approaches leveraging fine-grained supervised fine-tuning, and video-specific RLHF with multi-dimensional reward mechanisms for comprehensive performance improvements; (iv) excellent model acceleration achieving ~10x inference speedup through multi-stage distillation strategies and system-level optimizations. Seedance 1.0 can generate a 5-second video at 1080p resolution only with 41.4 seconds (NVIDIA-L20). Compared to state-of-the-art video generation models, Seedance 1.0 stands out with high-quality and fast video generation having superior spatiotemporal fluidity with structural stability, precise instruction adherence in complex multi-subject contexts, native multi-shot narrative coherence with consistent subject representation.
△ Less
Submitted 28 June, 2025; v1 submitted 10 June, 2025;
originally announced June 2025.
-
Stepwise Decomposition and Dual-stream Focus: A Novel Approach for Training-free Camouflaged Object Segmentation
Authors:
Chao Yin,
Hao Li,
Kequan Yang,
Jide Li,
Pinpin Zhu,
Xiaoqiang Li
Abstract:
While promptable segmentation (\textit{e.g.}, SAM) has shown promise for various segmentation tasks, it still requires manual visual prompts for each object to be segmented. In contrast, task-generic promptable segmentation aims to reduce the need for such detailed prompts by employing only a task-generic prompt to guide segmentation across all test samples. However, when applied to Camouflaged Ob…
▽ More
While promptable segmentation (\textit{e.g.}, SAM) has shown promise for various segmentation tasks, it still requires manual visual prompts for each object to be segmented. In contrast, task-generic promptable segmentation aims to reduce the need for such detailed prompts by employing only a task-generic prompt to guide segmentation across all test samples. However, when applied to Camouflaged Object Segmentation (COS), current methods still face two critical issues: 1) \textit{\textbf{semantic ambiguity in getting instance-specific text prompts}}, which arises from insufficient discriminative cues in holistic captions, leading to foreground-background confusion; 2) \textit{\textbf{semantic discrepancy combined with spatial separation in getting instance-specific visual prompts}}, which results from global background sampling far from object boundaries with low feature correlation, causing SAM to segment irrelevant regions. To address the issues above, we propose \textbf{RDVP-MSD}, a novel training-free test-time adaptation framework that synergizes \textbf{R}egion-constrained \textbf{D}ual-stream \textbf{V}isual \textbf{P}rompting (RDVP) via \textbf{M}ultimodal \textbf{S}tepwise \textbf{D}ecomposition Chain of Thought (MSD-CoT). MSD-CoT progressively disentangles image captions to eliminate semantic ambiguity, while RDVP injects spatial constraints into visual prompting and independently samples visual prompts for foreground and background points, effectively mitigating semantic discrepancy and spatial separation. Without requiring any training or supervision, RDVP-MSD achieves a state-of-the-art segmentation result on multiple COS benchmarks and delivers a faster inference speed than previous methods, demonstrating significantly improved accuracy and efficiency. The codes will be available at \href{https://github.com/ycyinchao/RDVP-MSD}{https://github.com/ycyinchao/RDVP-MSD}
△ Less
Submitted 6 July, 2025; v1 submitted 7 June, 2025;
originally announced June 2025.
-
En Route Path-planning for Partially Occupied Vehicles in Ride-pooling Systems
Authors:
Pengbo Zhu,
Giancarlo Ferrari-Trecate,
Nikolas Geroliminis
Abstract:
Ride-pooling services, such as UberPool and Lyft Shared Saver, enable a single vehicle to serve multiple customers within one shared trip. Efficient path-planning algorithms are crucial for improving the performance of such systems. For partially occupied vehicles with available capacity, we introduce a novel routing algorithm designed to maximize the likelihood of picking up additional passengers…
▽ More
Ride-pooling services, such as UberPool and Lyft Shared Saver, enable a single vehicle to serve multiple customers within one shared trip. Efficient path-planning algorithms are crucial for improving the performance of such systems. For partially occupied vehicles with available capacity, we introduce a novel routing algorithm designed to maximize the likelihood of picking up additional passengers while serving the current passengers to their destination. Unlike traditional methods that group passengers and vehicles based on predefined time windows, our algorithm allows for immediate responses to passenger requests. Our approach optimizes travel time while dynamically considering passenger demand and coordinating with other vehicles. Formulated as an integer linear programming (ILP) problem, our method is computationally efficient and suitable for real-time applications. Simulation results demonstrate that our proposed method can significantly enhance service quality.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Normal Distribution of Crab Pulsar Glitch Activity from a Glitch Cluster Perspective
Authors:
Pei-Xin Zhu,
Xiao-Ping Zheng
Abstract:
As the most extensively and continuously monitored neutron star, the Crab pulsar serves as representative of the earliest evolutionary stage. Its unique and complex glitch phenomenology provides an unparalleled testing ground for theoretical models of neutron star interior dynamics. Within the self-organized criticality paradigm, Crab pulsar glitch sizes are modeled by a power-law distribution and…
▽ More
As the most extensively and continuously monitored neutron star, the Crab pulsar serves as representative of the earliest evolutionary stage. Its unique and complex glitch phenomenology provides an unparalleled testing ground for theoretical models of neutron star interior dynamics. Within the self-organized criticality paradigm, Crab pulsar glitch sizes are modeled by a power-law distribution and waiting times by an exponential distribution. However, this framework is incompatible with neutron-star microphysics and fails to account for the quasi-periodic glitch behavior. Using a glitch-clustering perspective, which is motivated by the hypothesis that each event releases only a fraction of the stored angular momentum, we merged small glitches occurring within short temporal separations. We reveal a correlation between glitch size and waiting time and uncover that the waiting-time cumulative distribution function follows a normal distribution. Crucially, without recourse to complex statistical models, this approach permits a reasonable forecast of the next glitch. From the perspective of the dense-glitch region, the Crab pulsar currently falls within the $3σ$-$4σ$ probability interval. For the existence of a long-term periodicity of 6.68 years, the $\pm1σ$ interval defines a time window extending from the present to approximately 387 days ahead, this implies the next glith would emerge at any time before MJD 61081 (February 2026).
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
-
Science Prospects for the Southern Wide-field Gamma-ray Observatory: SWGO
Authors:
SWGO Collaboration,
P. Abreu,
R. Alfaro,
A. Alfonso,
M. Andrade,
E. O. Angüner,
E. A. Anita-Rangel,
O. Aquines-Gutiérrez,
C. Arcaro,
R. Arceo,
J. C. Arteaga-Velázquez,
P. Assis,
H. A. Ayala Solares,
A. Bakalova,
E. M. Bandeira,
P. Bangale,
U. Barres de Almeida,
P. Batista,
I. Batković,
J. Bazo,
E. Belmont,
J. Bennemann,
S. Y. BenZvi,
A. Bernal,
W. Bian
, et al. (295 additional authors not shown)
Abstract:
Ground-based gamma-ray astronomy is now well established as a key observational approach to address critical topics at the frontiers of astroparticle physics and high-energy astrophysics. Whilst the field of TeV astronomy was once dominated by arrays of atmospheric Cherenkov Telescopes, ground-level particle detection has now been demonstrated to be an equally viable and strongly complementary app…
▽ More
Ground-based gamma-ray astronomy is now well established as a key observational approach to address critical topics at the frontiers of astroparticle physics and high-energy astrophysics. Whilst the field of TeV astronomy was once dominated by arrays of atmospheric Cherenkov Telescopes, ground-level particle detection has now been demonstrated to be an equally viable and strongly complementary approach. Ground-level particle detection provides continuous monitoring of the overhead sky, critical for the mapping of extended structures and capturing transient phenomena. As demonstrated by HAWC and LHAASO, the technique provides the best available sensitivity above a few tens of TeV, and for the first time access to the PeV energy range. Despite the success of this approach, there is so far no major ground-level particle-based observatory with access to the Southern sky. HESS, located in Namibia, is the only major gamma-ray instrument in the Southern Hemisphere, and has shown the extraordinary richness of the inner galaxy in the TeV band, but is limited in terms of field of view and energy reach.
SWGO is an international effort to construct the first wide-field instrument in the south with deep sensitivity from 100s of GeV into the PeV domain. The project is now close to the end of its development phase and planning for construction of the array in Chile has begun. Here we describe the baseline design, expected sensitivity and resolution, and describe in detail the main scientific topics that will be addressed by this new facility and its initial phase SWGO-A. We show that SWGO will have a transformational impact on a wide range of topics from cosmic-ray acceleration and transport to the nature of dark matter. SWGO represents a key piece of infrastructure for multi-messenger astronomy in the next decade, with strong scientific synergies with the nearby CTA Observatory.
△ Less
Submitted 25 June, 2025; v1 submitted 2 June, 2025;
originally announced June 2025.
-
New Physics Search at the CEPC: a General Perspective
Authors:
Stefan Antusch,
Peter Athron,
Daniele Barducci,
Long Chen,
Mingshui Chen,
Xiang Chen,
Huajie Cheng,
Kingman Cheung,
Joao Guimaraes da Costa,
Arindam Das,
Frank F. Deppisch,
P. S. Bhupal Dev,
Xiaokang Du,
Yong Du,
Yaquan Fang,
Andrew Fowlie,
Yu Gao,
Bruce Mellado Garcia,
Shao-Feng Ge,
Jiayin Gu,
Yu-Chen Guo,
Jan Hajer,
Chengcheng Han,
Tao Han,
Sven Heinemeyer
, et al. (68 additional authors not shown)
Abstract:
The Circular Electron-Positron Collider (CEPC), a proposed next-generation Higgs factory, provides new opportunities to explore physics beyond the Standard Model (SM). With its clean electron-positron collision environment and the ability to collect large samples of Higgs, W, and Z bosons, the CEPC enables precision measurements and searches for new physics. This white paper outlines the CEPC's di…
▽ More
The Circular Electron-Positron Collider (CEPC), a proposed next-generation Higgs factory, provides new opportunities to explore physics beyond the Standard Model (SM). With its clean electron-positron collision environment and the ability to collect large samples of Higgs, W, and Z bosons, the CEPC enables precision measurements and searches for new physics. This white paper outlines the CEPC's discovery potential, including studies of exotic decays of the Higgs, Z, and top quarks, dark matter and dark sector phenomena, long-lived particles, supersymmetry, and neutrino-related signatures. Advanced detector technologies and reconstruction techniques, such as one-to-one correspondence reconstruction and jet origin identification, significantly improve sensitivity to rare and weakly interacting processes. The CEPC is particularly well suited to probe the electroweak phase transition and test models of electroweak baryogenesis and dark sector interactions. In addition, global fit analyses highlight the CEPC's complementary role in constraining a wide range of new physics scenarios. These features position the CEPC as a powerful tool for exploring the next frontier in fundamental particle physics in the post-Higgs discovery era.
△ Less
Submitted 30 May, 2025;
originally announced May 2025.
-
LLM-Synth4KWS: Scalable Automatic Generation and Synthesis of Confusable Data for Custom Keyword Spotting
Authors:
Pai Zhu,
Quan Wang,
Dhruuv Agarwal,
Kurt Partridge
Abstract:
Custom keyword spotting (KWS) allows detecting user-defined spoken keywords from streaming audio. This is achieved by comparing the embeddings from voice enrollments and input audio. State-of-the-art custom KWS models are typically trained contrastively using utterances whose keywords are randomly sampled from training dataset. These KWS models often struggle with confusing keywords, such as "blue…
▽ More
Custom keyword spotting (KWS) allows detecting user-defined spoken keywords from streaming audio. This is achieved by comparing the embeddings from voice enrollments and input audio. State-of-the-art custom KWS models are typically trained contrastively using utterances whose keywords are randomly sampled from training dataset. These KWS models often struggle with confusing keywords, such as "blue" versus "glue". This paper introduces an effective way to augment the training with confusable utterances where keywords are generated and grouped from large language models (LLMs), and speech signals are synthesized with diverse speaking styles from text-to-speech (TTS) engines. To better measure user experience on confusable KWS, we define a new northstar metric using the average area under DET curve from confusable groups (c-AUC). Featuring high scalability and zero labor cost, the proposed method improves AUC by 3.7% and c-AUC by 11.3% on the Speech Commands testing set.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
Multimodal Emotion Recognition in Conversations: A Survey of Methods, Trends, Challenges and Prospects
Authors:
Chengyan Wu,
Yiqiang Cai,
Yang Liu,
Pengxu Zhu,
Yun Xue,
Ziwei Gong,
Julia Hirschberg,
Bolei Ma
Abstract:
While text-based emotion recognition methods have achieved notable success, real-world dialogue systems often demand a more nuanced emotional understanding than any single modality can offer. Multimodal Emotion Recognition in Conversations (MERC) has thus emerged as a crucial direction for enhancing the naturalness and emotional understanding of human-computer interaction. Its goal is to accuratel…
▽ More
While text-based emotion recognition methods have achieved notable success, real-world dialogue systems often demand a more nuanced emotional understanding than any single modality can offer. Multimodal Emotion Recognition in Conversations (MERC) has thus emerged as a crucial direction for enhancing the naturalness and emotional understanding of human-computer interaction. Its goal is to accurately recognize emotions by integrating information from various modalities such as text, speech, and visual signals.
This survey offers a systematic overview of MERC, including its motivations, core tasks, representative methods, and evaluation strategies. We further examine recent trends, highlight key challenges, and outline future directions. As interest in emotionally intelligent systems grows, this survey provides timely guidance for advancing MERC research.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
Single-shot 3D characterization the spatiotemporal optical vortex via a spatiotemporal wavefront sensor (STWFS)
Authors:
Xiuyu Yao,
Ping Zhu,
Youjian Yi,
Zezhao Gong,
Dongjun Zhang,
Ailin Guo,
Fucai Ding,
Xiao Liang,
Xuejie Zhang,
Meizhi Sun,
Qiang Zhang,
Miaoyan Tong,
Lijie Cui,
Hailun Zen,
Xinglong Xie,
Jianqiang Zhu
Abstract:
The advent of spatiotemporal wave packets (STWPs), represented by spatiotemporal optical vortices (STOVs), has paved the way for the exploration in optics and photonics. To date, despite considerable efforts, a comprehensive and efficient practical means to characterizing wave packets with such complex structures is still lacking. In this study, we introduced a new method designed to achieve high-…
▽ More
The advent of spatiotemporal wave packets (STWPs), represented by spatiotemporal optical vortices (STOVs), has paved the way for the exploration in optics and photonics. To date, despite considerable efforts, a comprehensive and efficient practical means to characterizing wave packets with such complex structures is still lacking. In this study, we introduced a new method designed to achieve high-precision and high-throughput spatiotemporal wave packet measurements using a user-friendly set up. This method is based on a quadriwave lateral shearing interferometric wavefront sensor that utilizes wavelength division multiplexing, termed the "spatiotemporal wavefront sensor (STWFS)." Using this method, we have fabricated a compact prototype with 295 * 295 spatial pixels * 36 wavelength channels of 0.5 nm spectral resolution in a single frame. This STWFS enabled, for the first time, single-shot self-referenced spatiotemporal three-dimensional (3D) optical field characterizations of STOV pulses with transverse orbital angular momenta L of 1 and 2, and obtained the dynamic visualization of the focused propagation of STOV pulses. Furthermore, the STWFS provides a 1.87 nm (0.95%) root mean square (RMS) absolute accuracy for spatiotemporal phase reconstruction. This achievement represents the highest performance compared with other three-dimensional spatiotemporal metrology methods. As a spatiotemporal optical field characterization method, the STWFS offers ultrafast 3D diagnostics, contributing to spatiotemporal photonics and broader applications across different fields, such as light-matter interactions and optical communications.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
GraphemeAug: A Systematic Approach to Synthesized Hard Negative Keyword Spotting Examples
Authors:
Harry Zhang,
Kurt Partridge,
Pai Zhu,
Neng Chen,
Hyun Jin Park,
Dhruuv Agarwal,
Quan Wang
Abstract:
Spoken Keyword Spotting (KWS) is the task of distinguishing between the presence and absence of a keyword in audio. The accuracy of a KWS model hinges on its ability to correctly classify examples close to the keyword and non-keyword boundary. These boundary examples are often scarce in training data, limiting model performance. In this paper, we propose a method to systematically generate adversa…
▽ More
Spoken Keyword Spotting (KWS) is the task of distinguishing between the presence and absence of a keyword in audio. The accuracy of a KWS model hinges on its ability to correctly classify examples close to the keyword and non-keyword boundary. These boundary examples are often scarce in training data, limiting model performance. In this paper, we propose a method to systematically generate adversarial examples close to the decision boundary by making insertion/deletion/substitution edits on the keyword's graphemes. We evaluate this technique on held-out data for a popular keyword and show that the technique improves AUC on a dataset of synthetic hard negatives by 61% while maintaining quality on positives and ambient negative audio data.
△ Less
Submitted 24 May, 2025; v1 submitted 20 May, 2025;
originally announced May 2025.
-
CE-LSLM: Efficient Large-Small Language Model Inference and Communication via Cloud-Edge Collaboration
Authors:
Pengyan Zhu,
Tingting Yang
Abstract:
Emerging intelligent service scenarios in 6G communication impose stringent requirements for low latency, high reliability, and privacy preservation. Generative large language models (LLMs) are gradually becoming key enablers for the integration of semantic communication and computation. However, due to the limited computational resources of edge devices and the increasing complexity of heterogene…
▽ More
Emerging intelligent service scenarios in 6G communication impose stringent requirements for low latency, high reliability, and privacy preservation. Generative large language models (LLMs) are gradually becoming key enablers for the integration of semantic communication and computation. However, due to the limited computational resources of edge devices and the increasing complexity of heterogeneous terminal access, existing centralized inference approaches fail to meet the dual demands of response efficiency and data privacy in edge-side inference tasks. To address these challenges, this paper proposes a novel collaborative inference architecture that integrates cloud-based LLMs with edge-deployed small language models (SLMs), enabling dynamic scheduling and sharing of semantic-level intermediate states, and establishing a unified computation-communication paradigm tailored for 6G networks. Specifically, a key-value (KV) cache reuse mechanism is introduced to enhance the semantic understanding of edge models through contextual guidance from the cloud, while significantly reducing edge-side computational and storage overhead. Furthermore, a cross-node parallel scheduling mechanism is proposed to achieve asynchronous coordination between model state loading and decoding computation, thereby improving edge responsiveness. In addition, we investigate layer alignment and representation compression strategies between heterogeneous models to alleviate the communication burden on the edge. Experimental results demonstrate that the proposed architecture exhibits superior adaptability and scalability in terms of inference latency, system stability, and concurrent processing capacity.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
DecIF: Improving Instruction-Following through Meta-Decomposition
Authors:
Tingfeng Hui,
Pengyu Zhu,
Bowen Ping,
Ling Tang,
Guanting Dong,
Yaqi Zhang,
Sen Su
Abstract:
Instruction-following has emerged as a crucial capability for large language models (LLMs). However, existing approaches often rely on pre-existing documents or external resources to synthesize instruction-following data, which limits their flexibility and generalizability. In this paper, we introduce DecIF, a fully autonomous, meta-decomposition guided framework that generates diverse and high-qu…
▽ More
Instruction-following has emerged as a crucial capability for large language models (LLMs). However, existing approaches often rely on pre-existing documents or external resources to synthesize instruction-following data, which limits their flexibility and generalizability. In this paper, we introduce DecIF, a fully autonomous, meta-decomposition guided framework that generates diverse and high-quality instruction-following data using only LLMs. DecIF is grounded in the principle of decomposition. For instruction generation, we guide LLMs to iteratively produce various types of meta-information, which are then combined with response constraints to form well-structured and semantically rich instructions. We further utilize LLMs to detect and resolve potential inconsistencies within the generated instructions. Regarding response generation, we decompose each instruction into atomic-level evaluation criteria, enabling rigorous validation and the elimination of inaccurate instruction-response pairs. Extensive experiments across a wide range of scenarios and settings demonstrate DecIF's superior performance on instruction-following tasks. Further analysis highlights its strong flexibility, scalability, and generalizability in automatically synthesizing high-quality instruction data.
△ Less
Submitted 10 June, 2025; v1 submitted 20 May, 2025;
originally announced May 2025.
-
SourceDetMamba: A Graph-aware State Space Model for Source Detection in Sequential Hypergraphs
Authors:
Le Cheng,
Peican Zhu,
Yangming Guo,
Chao Gao,
Zhen Wang,
Keke Tang
Abstract:
Source detection on graphs has demonstrated high efficacy in identifying rumor origins. Despite advances in machine learning-based methods, many fail to capture intrinsic dynamics of rumor propagation. In this work, we present SourceDetMamba: A Graph-aware State Space Model for Source Detection in Sequential Hypergraphs, which harnesses the recent success of the state space model Mamba, known for…
▽ More
Source detection on graphs has demonstrated high efficacy in identifying rumor origins. Despite advances in machine learning-based methods, many fail to capture intrinsic dynamics of rumor propagation. In this work, we present SourceDetMamba: A Graph-aware State Space Model for Source Detection in Sequential Hypergraphs, which harnesses the recent success of the state space model Mamba, known for its superior global modeling capabilities and computational efficiency, to address this challenge. Specifically, we first employ hypergraphs to model high-order interactions within social networks. Subsequently, temporal network snapshots generated during the propagation process are sequentially fed in reverse order into Mamba to infer underlying propagation dynamics. Finally, to empower the sequential model to effectively capture propagation patterns while integrating structural information, we propose a novel graph-aware state update mechanism, wherein the state of each node is propagated and refined by both temporal dependencies and topological context. Extensive evaluations on eight datasets demonstrate that SourceDetMamba consistently outperforms state-of-the-art approaches.
△ Less
Submitted 4 June, 2025; v1 submitted 19 May, 2025;
originally announced May 2025.
-
HyperDet: Source Detection in Hypergraphs via Interactive Relationship Construction and Feature-rich Attention Fusion
Authors:
Le Cheng,
Peican Zhu,
Yangming Guo,
Keke Tang,
Chao Gao,
Zhen Wang
Abstract:
Hypergraphs offer superior modeling capabilities for social networks, particularly in capturing group phenomena that extend beyond pairwise interactions in rumor propagation. Existing approaches in rumor source detection predominantly focus on dyadic interactions, which inadequately address the complexity of more intricate relational structures. In this study, we present a novel approach for Sourc…
▽ More
Hypergraphs offer superior modeling capabilities for social networks, particularly in capturing group phenomena that extend beyond pairwise interactions in rumor propagation. Existing approaches in rumor source detection predominantly focus on dyadic interactions, which inadequately address the complexity of more intricate relational structures. In this study, we present a novel approach for Source Detection in Hypergraphs (HyperDet) via Interactive Relationship Construction and Feature-rich Attention Fusion. Specifically, our methodology employs an Interactive Relationship Construction module to accurately model both the static topology and dynamic interactions among users, followed by the Feature-rich Attention Fusion module, which autonomously learns node features and discriminates between nodes using a self-attention mechanism, thereby effectively learning node representations under the framework of accurately modeled higher-order relationships. Extensive experimental validation confirms the efficacy of our HyperDet approach, showcasing its superiority relative to current state-of-the-art methods.
△ Less
Submitted 4 June, 2025; v1 submitted 19 May, 2025;
originally announced May 2025.
-
A Hybrid Prior Bayesian Method for Combining Domestic Real-World Data and Overseas Data in Global Drug Development
Authors:
Keer Chen,
Zengyue Zheng,
Pengfei Zhu,
Shuping Jiang,
Nan Li,
Jumin Deng,
Pingyan Chen,
Zhenyu Wu,
Ying Wu
Abstract:
Background Hybrid clinical trial design integrates randomized controlled trials (RCTs) with real-world data (RWD) to enhance efficiency through dynamic incorporation of external data. Existing methods like the Meta-Analytic Predictive Prior (MAP) inadequately control data heterogeneity, adjust baseline discrepancies, or optimize dynamic borrowing proportions, introducing bias and limiting applicat…
▽ More
Background Hybrid clinical trial design integrates randomized controlled trials (RCTs) with real-world data (RWD) to enhance efficiency through dynamic incorporation of external data. Existing methods like the Meta-Analytic Predictive Prior (MAP) inadequately control data heterogeneity, adjust baseline discrepancies, or optimize dynamic borrowing proportions, introducing bias and limiting applications in bridging trials and multi-regional clinical trials (MRCTs). Objective This study proposes a novel hybrid Bayesian framework (EQPS-rMAP) to address heterogeneity and bias in multi-source data integration, validated through simulations and retrospective case analyses of risankizumab's efficacy in moderate-to-severe plaque psoriasis. Design and Methods EQPS-rMAP eliminates baseline covariate discrepancies via propensity score stratification, constructs stratum-specific MAP priors to dynamically adjust external data weights, and introduces equivalence probability weights to quantify data conflict risks. Performance was evaluated across six simulated scenarios (heterogeneity differences, baseline shifts) and real-world case analyses, comparing it with traditional methods (MAP, PSMAP, EBMAP) on estimation bias, type I error control, and sample size requirements. Results Simulations show EQPS-rMAP maintains estimation robustness under significant heterogeneity while reducing sample size demands and enhancing trial efficiency. Case analyses confirm superior external bias control and accuracy compared to conventional approaches. Conclusion and Significance EQPS-rMAP provides empirical evidence for hybrid clinical designs. By resolving baseline-heterogeneity conflicts through adaptive mechanisms, it enables reliable integration of external and real-world data in bridging trials, MRCTs, and post-marketing studies, broadening applicability without compromising rigor.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
Q-space Guided Collaborative Attention Translation Network for Flexible Diffusion-Weighted Images Synthesis
Authors:
Pengli Zhu,
Yingji Fu,
Nanguang Chen,
Anqi Qiu
Abstract:
This study, we propose a novel Q-space Guided Collaborative Attention Translation Networks (Q-CATN) for multi-shell, high-angular resolution DWI (MS-HARDI) synthesis from flexible q-space sampling, leveraging the commonly acquired structural MRI data. Q-CATN employs a collaborative attention mechanism to effectively extract complementary information from multiple modalities and dynamically adjust…
▽ More
This study, we propose a novel Q-space Guided Collaborative Attention Translation Networks (Q-CATN) for multi-shell, high-angular resolution DWI (MS-HARDI) synthesis from flexible q-space sampling, leveraging the commonly acquired structural MRI data. Q-CATN employs a collaborative attention mechanism to effectively extract complementary information from multiple modalities and dynamically adjust its internal representations based on flexible q-space information, eliminating the need for fixed sampling schemes. Additionally, we introduce a range of task-specific constraints to preserve anatomical fidelity in DWI, enabling Q-CATN to accurately learn the intrinsic relationships between directional DWI signal distributions and q-space. Extensive experiments on the Human Connectome Project (HCP) dataset demonstrate that Q-CATN outperforms existing methods, including 1D-qDL, 2D-qDL, MESC-SD, and QGAN, in estimating parameter maps and fiber tracts both quantitatively and qualitatively, while preserving fine-grained details. Notably, its ability to accommodate flexible q-space sampling highlights its potential as a promising toolkit for clinical and research applications. Our code is available at https://github.com/Idea89560041/Q-CATN.
△ Less
Submitted 14 May, 2025;
originally announced May 2025.
-
The Geography of Transportation Cybersecurity: Visitor Flows, Industry Clusters, and Spatial Dynamics
Authors:
Yuhao Wang,
Kailai Wang,
Songhua Hu,
Yunpeng,
Zhang,
Gino Lim,
Pengyu Zhu
Abstract:
The rapid evolution of the transportation cybersecurity ecosystem, encompassing cybersecurity, automotive, and transportation and logistics sectors, will lead to the formation of distinct spatial clusters and visitor flow patterns across the US. This study examines the spatiotemporal dynamics of visitor flows, analyzing how socioeconomic factors shape industry clustering and workforce distribution…
▽ More
The rapid evolution of the transportation cybersecurity ecosystem, encompassing cybersecurity, automotive, and transportation and logistics sectors, will lead to the formation of distinct spatial clusters and visitor flow patterns across the US. This study examines the spatiotemporal dynamics of visitor flows, analyzing how socioeconomic factors shape industry clustering and workforce distribution within these evolving sectors. To model and predict visitor flow patterns, we develop a BiTransGCN framework, integrating an attention-based Transformer architecture with a Graph Convolutional Network backbone. By integrating AI-enabled forecasting techniques with spatial analysis, this study improves our ability to track, interpret, and anticipate changes in industry clustering and mobility trends, thereby supporting strategic planning for a secure and resilient transportation network. It offers a data-driven foundation for economic planning, workforce development, and targeted investments in the transportation cybersecurity ecosystem.
△ Less
Submitted 12 May, 2025;
originally announced May 2025.
-
High-Frequency Prior-Driven Adaptive Masking for Accelerating Image Super-Resolution
Authors:
Wei Shang,
Dongwei Ren,
Wanying Zhang,
Pengfei Zhu,
Qinghua Hu,
Wangmeng Zuo
Abstract:
The primary challenge in accelerating image super-resolution lies in reducing computation while maintaining performance and adaptability. Motivated by the observation that high-frequency regions (e.g., edges and textures) are most critical for reconstruction, we propose a training-free adaptive masking module for acceleration that dynamically focuses computation on these challenging areas. Specifi…
▽ More
The primary challenge in accelerating image super-resolution lies in reducing computation while maintaining performance and adaptability. Motivated by the observation that high-frequency regions (e.g., edges and textures) are most critical for reconstruction, we propose a training-free adaptive masking module for acceleration that dynamically focuses computation on these challenging areas. Specifically, our method first extracts high-frequency components via Gaussian blur subtraction and adaptively generates binary masks using K-means clustering to identify regions requiring intensive processing. Our method can be easily integrated with both CNNs and Transformers. For CNN-based architectures, we replace standard $3 \times 3$ convolutions with an unfold operation followed by $1 \times 1$ convolutions, enabling pixel-wise sparse computation guided by the mask. For Transformer-based models, we partition the mask into non-overlapping windows and selectively process tokens based on their average values. During inference, unnecessary pixels or windows are pruned, significantly reducing computation. Moreover, our method supports dilation-based mask adjustment to control the processing scope without retraining, and is robust to unseen degradations (e.g., noise, compression). Extensive experiments on benchmarks demonstrate that our method reduces FLOPs by 24--43% for state-of-the-art models (e.g., CARN, SwinIR) while achieving comparable or better quantitative metrics. The source code is available at https://github.com/shangwei5/AMSR
△ Less
Submitted 11 May, 2025;
originally announced May 2025.
-
Bi-directional Self-Registration for Misaligned Infrared-Visible Image Fusion
Authors:
Timing Li,
Bing Cao,
Pengfei Zhu,
Bin Xiao,
Qinghua Hu
Abstract:
Acquiring accurately aligned multi-modal image pairs is fundamental for achieving high-quality multi-modal image fusion. To address the lack of ground truth in current multi-modal image registration and fusion methods, we propose a novel self-supervised \textbf{B}i-directional \textbf{S}elf-\textbf{R}egistration framework (\textbf{B-SR}). Specifically, B-SR utilizes a proxy data generator (PDG) an…
▽ More
Acquiring accurately aligned multi-modal image pairs is fundamental for achieving high-quality multi-modal image fusion. To address the lack of ground truth in current multi-modal image registration and fusion methods, we propose a novel self-supervised \textbf{B}i-directional \textbf{S}elf-\textbf{R}egistration framework (\textbf{B-SR}). Specifically, B-SR utilizes a proxy data generator (PDG) and an inverse proxy data generator (IPDG) to achieve self-supervised global-local registration. Visible-infrared image pairs with spatially misaligned differences are aligned to obtain global differences through the registration module. The same image pairs are processed by PDG, such as cropping, flipping, stitching, etc., and then aligned to obtain local differences. IPDG converts the obtained local differences into pseudo-global differences, which are used to perform global-local difference consistency with the global differences. Furthermore, aiming at eliminating the effect of modal gaps on the registration module, we design a neighborhood dynamic alignment loss to achieve cross-modal image edge alignment. Extensive experiments on misaligned multi-modal images demonstrate the effectiveness of the proposed method in multi-modal image alignment and fusion against the competing methods. Our code will be publicly available.
△ Less
Submitted 11 May, 2025;
originally announced May 2025.
-
Versatile Distributed Maneuvering with Generalized Formations using Guiding Vector Fields
Authors:
Yang Lu,
Sha Luo,
Pengming Zhu,
Weijia Yao,
Hector Garcia de Marina,
Xinglong Zhang,
Xin Xu
Abstract:
This paper presents a unified approach to realize versatile distributed maneuvering with generalized formations. Specifically, we decompose the robots' maneuvers into two independent components, i.e., interception and enclosing, which are parameterized by two independent virtual coordinates. Treating these two virtual coordinates as dimensions of an abstract manifold, we derive the corresponding s…
▽ More
This paper presents a unified approach to realize versatile distributed maneuvering with generalized formations. Specifically, we decompose the robots' maneuvers into two independent components, i.e., interception and enclosing, which are parameterized by two independent virtual coordinates. Treating these two virtual coordinates as dimensions of an abstract manifold, we derive the corresponding singularity-free guiding vector field (GVF), which, along with a distributed coordination mechanism based on the consensus theory, guides robots to achieve various motions (i.e., versatile maneuvering), including (a) formation tracking, (b) target enclosing, and (c) circumnavigation. Additional motion parameters can generate more complex cooperative robot motions. Based on GVFs, we design a controller for a nonholonomic robot model. Besides the theoretical results, extensive simulations and experiments are performed to validate the effectiveness of the approach.
△ Less
Submitted 9 May, 2025;
originally announced May 2025.
-
SITE: towards Spatial Intelligence Thorough Evaluation
Authors:
Wenqi Wang,
Reuben Tan,
Pengyue Zhu,
Jianwei Yang,
Zhengyuan Yang,
Lijuan Wang,
Andrey Kolobov,
Jianfeng Gao,
Boqing Gong
Abstract:
Spatial intelligence (SI) represents a cognitive ability encompassing the visualization, manipulation, and reasoning about spatial relationships, underpinning disciplines from neuroscience to robotics. We introduce SITE, a benchmark dataset towards SI Thorough Evaluation in a standardized format of multi-choice visual question-answering, designed to assess large vision-language models' spatial int…
▽ More
Spatial intelligence (SI) represents a cognitive ability encompassing the visualization, manipulation, and reasoning about spatial relationships, underpinning disciplines from neuroscience to robotics. We introduce SITE, a benchmark dataset towards SI Thorough Evaluation in a standardized format of multi-choice visual question-answering, designed to assess large vision-language models' spatial intelligence across diverse visual modalities (single-image, multi-image, and video) and SI factors (figural to environmental scales, spatial visualization and orientation, intrinsic and extrinsic, static and dynamic). Our approach to curating the benchmark combines a bottom-up survey about 31 existing datasets and a top-down strategy drawing upon three classification systems in cognitive science, which prompt us to design two novel types of tasks about view-taking and dynamic scenes. Extensive experiments reveal that leading models fall behind human experts especially in spatial orientation, a fundamental SI factor. Moreover, we demonstrate a positive correlation between a model's spatial reasoning proficiency and its performance on an embodied AI task.
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
Increasing the density limit with ECRH-assisted Ohmic start-up on EAST
Authors:
Jiaxing Liu,
Ping Zhu,
Dominique Franck Escande,
Wenbin Liu,
Shiwei Xue,
Xin Lin,
Panjun Tang,
Liang Wang,
Ning Yan,
Jinju Yang,
Yanmin Duan,
Kai Jia,
Zhenwei Wu,
Yunxin Cheng,
Ling Zhang,
Jinping Qian,
Rui Ding,
Ruijie Zhou,
the EAST team
Abstract:
High plasma density operation is crucial for a tokamak to achieve energy breakeven and a burning plasma. However, there is often an empirical upper limit of electron density in tokamak operation, namely the Greenwald density limit $n_G$, above which tokamaks generally disrupt. Achieving high-density operations above the density limit has been a long-standing challenge in magnetic confinement fusio…
▽ More
High plasma density operation is crucial for a tokamak to achieve energy breakeven and a burning plasma. However, there is often an empirical upper limit of electron density in tokamak operation, namely the Greenwald density limit $n_G$, above which tokamaks generally disrupt. Achieving high-density operations above the density limit has been a long-standing challenge in magnetic confinement fusion research. Here, we report experimental results on EAST tokamak achieving the line-averaged electron density in the range of 1.3 $n_G$ to 1.65 $n_G$,while the usual range in EAST is (0.8-1.0)$n_G$. This is performed with ECRH-assisted Ohmic start-up and a sufficiently high initial neutral density. This is motivated by and consistent with predictions of a recent plasma-wall self-organization (PWSO) theory, that increasing ECRH power or pre-filled gas pressure leads to lower plasma temperatures around divertor target and higher density limits. In addition, the experiments are shown to operate in the density-free regime predicted by the PWSO model. These results suggest a promising scheme for substantially increasing the density limit in tokamaks, a critical advancement toward achieving the burning plasma.
△ Less
Submitted 5 May, 2025;
originally announced May 2025.
-
Measuring short-range correlations and quasi-elastic cross sections in A(e,e') at x>1 and modest Q$^2$
Authors:
Y. P. Zhang,
Z. H. Ye,
D. Nguyen,
P. Aguilera,
Z. Ahmed,
H. Albataineh,
K. Allada,
B. Anderson,
D. Anez,
K. Aniol,
J. Annand,
J. Arrington,
T. Averett,
H. Baghdasaryan,
X. Bai,
A. Beck,
S. Beck,
V. Bellini,
F. Benmokhtar,
A. Camsonne,
C. Chen,
J. -P. Chen,
K. Chirapatpimol,
E. Cisbani,
S. Covrig Dusa
, et al. (74 additional authors not shown)
Abstract:
We present results from the Jefferson Lab E08-014 experiment, investigating short-range correlations (SRC) through measurements of absolute inclusive quasi-elastic cross sections and their ratios. This study utilized 3.356 GeV electrons scattered off targets including $^2$H, $^3$He, $^4$He, $^{12}$C, $^{40}$Ca, and $^{48}$Ca, at modest momentum transfers ($1.3 < Q^2 \leq 2$ GeV$^2$). Kinematics we…
▽ More
We present results from the Jefferson Lab E08-014 experiment, investigating short-range correlations (SRC) through measurements of absolute inclusive quasi-elastic cross sections and their ratios. This study utilized 3.356 GeV electrons scattered off targets including $^2$H, $^3$He, $^4$He, $^{12}$C, $^{40}$Ca, and $^{48}$Ca, at modest momentum transfers ($1.3 < Q^2 \leq 2$ GeV$^2$). Kinematics were selected to enhance the cross-section contribution from high-momentum nucleons originating from the strongly interacting, short-distance components of two-nucleon SRCs (2N-SRCs), known to exhibit a universal structure across both light and heavy nuclei.We analyzed the A/$^2$H ratio within the region dominated by 2N-SRCs to characterize the nuclear dependence of SRC contributions across various nuclei. Additionally, the A/$^3$He ratio was examined at kinematics sensitive to nucleons with even higher momentum, aiming to identify signals indicative of three-nucleon SRCs (3N-SRCs). The traditional analysis method in the expected 3N-SRC region ($x > 2$) did not yield a clear plateau; instead, the data diverged from the predicted 3N-SRC behavior as momentum transfer increased. However, when analyzed in terms of the struck nucleon's light-cone momentum, the data exhibited the opposite trend, progressively approaching the predicted 3N-SRC plateau. These observations suggest that future measurements at higher energies may facilitate a definitive isolation and identification of 3N-SRCs.
△ Less
Submitted 24 April, 2025;
originally announced April 2025.
-
GenShin:geometry-enhanced structural graph embodies binding pose can better predicting compound-protein interaction affinity
Authors:
Pingfei Zhu,
Chenyang Zhao,
Haishi Zhao,
Bo Yang
Abstract:
AI-powered drug discovery typically relies on the successful prediction of compound-protein interactions, which are pivotal for the evaluation of designed compound molecules in structure-based drug design and represent a core challenge in the field.
However, accurately predicting compound-protein affinity via regression models usually requires adequate-binding pose, which are derived from costly…
▽ More
AI-powered drug discovery typically relies on the successful prediction of compound-protein interactions, which are pivotal for the evaluation of designed compound molecules in structure-based drug design and represent a core challenge in the field.
However, accurately predicting compound-protein affinity via regression models usually requires adequate-binding pose, which are derived from costly and complex experimental methods or time-consuming simulations with docking software. In response, we have introduced the GenShin model, which constructs a geometry-enhanced structural graph module that separately extracts additional features from proteins and compounds. Consequently, it attains an accuracy on par with mainstream models in predicting compound-protein affinities, while eliminating the need for adequate-binding pose as input. Our experimental findings demonstrate that the GenShin model vastly outperforms other models that rely on non-input docking conformations, achieving, or in some cases even exceeding, the performance of those requiring adequate-binding pose. Further experiments indicate that our GenShin model is more robust to inadequate-binding pose, affirming its higher suitability for real-world drug discovery scenarios. We hope our work will inspire more endeavors to bridge the gap between AI models and practical drug discovery challenges.
△ Less
Submitted 16 March, 2025;
originally announced April 2025.
-
DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency
Authors:
Mengshi Qi,
Pengfei Zhu,
Xiangtai Li,
Xiaoyang Bi,
Lu Qi,
Huadong Ma,
Ming-Hsuan Yang
Abstract:
Given a single labeled example, in-context segmentation aims to segment corresponding objects. This setting, known as one-shot segmentation in few-shot learning, explores the segmentation model's generalization ability and has been applied to various vision tasks, including scene understanding and image/video editing. While recent Segment Anything Models have achieved state-of-the-art results in i…
▽ More
Given a single labeled example, in-context segmentation aims to segment corresponding objects. This setting, known as one-shot segmentation in few-shot learning, explores the segmentation model's generalization ability and has been applied to various vision tasks, including scene understanding and image/video editing. While recent Segment Anything Models have achieved state-of-the-art results in interactive segmentation, these approaches are not directly applicable to in-context segmentation. In this work, we propose the Dual Consistency SAM (DC-SAM) method based on prompt-tuning to adapt SAM and SAM2 for in-context segmentation of both images and videos. Our key insights are to enhance the features of the SAM's prompt encoder in segmentation by providing high-quality visual prompts. When generating a mask prior, we fuse the SAM features to better align the prompt encoder. Then, we design a cycle-consistent cross-attention on fused features and initial visual prompts. Next, a dual-branch design is provided by using the discriminative positive and negative prompts in the prompt encoder. Furthermore, we design a simple mask-tube training strategy to adopt our proposed dual consistency method into the mask tube. Although the proposed DC-SAM is primarily designed for images, it can be seamlessly extended to the video domain with the support of SAM2. Given the absence of in-context segmentation in the video domain, we manually curate and construct the first benchmark from existing video segmentation datasets, named In-Context Video Object Segmentation (IC-VOS), to better assess the in-context capability of the model. Extensive experiments demonstrate that our method achieves 55.5 (+1.4) mIoU on COCO-20i, 73.0 (+1.1) mIoU on PASCAL-5i, and a J&F score of 71.52 on the proposed IC-VOS benchmark. Our source code and benchmark are available at https://github.com/zaplm/DC-SAM.
△ Less
Submitted 17 April, 2025; v1 submitted 16 April, 2025;
originally announced April 2025.
-
EdgePrompt: A Distributed Key-Value Inference Framework for LLMs in 6G Networks
Authors:
Jiahong Ning,
Pengyan Zhu,
Ce Zheng,
Gary Lee,
Sumei Sun,
Tingting Yang
Abstract:
As sixth-generation (6G) networks advance, large language models (LLMs) are increasingly integrated into 6G infrastructure to enhance network management and intelligence. However, traditional LLMs architecture struggle to meet the stringent latency and security requirements of 6G, especially as the increasing in sequence length leads to greater task complexity. This paper proposes Edge-Prompt, a c…
▽ More
As sixth-generation (6G) networks advance, large language models (LLMs) are increasingly integrated into 6G infrastructure to enhance network management and intelligence. However, traditional LLMs architecture struggle to meet the stringent latency and security requirements of 6G, especially as the increasing in sequence length leads to greater task complexity. This paper proposes Edge-Prompt, a cloud-edge collaborative framework based on a hierarchical attention splicing mechanism. EdgePrompt employs distributed key-value (KV) pair optimization techniques to accelerate inference and adapt to network conditions. Additionally, to reduce the risk of data leakage, EdgePrompt incorporates a privacy preserving strategy by isolating sensitive information during processing. Experiments on public dataset show that EdgePrompt effectively improves the inference throughput and reduces the latency, which provides a reliable solution for LLMs deployment in 6G environments.
△ Less
Submitted 15 April, 2025;
originally announced April 2025.
-
Finite-Precision Conjugate Gradient Method for Massive MIMO Detection
Authors:
Yiming Fang,
Li Chen,
Changsheng You,
Dingzhu Wen,
Pengcheng Zhu
Abstract:
The implementation of the conjugate gradient (CG) method for massive MIMO detection is computationally challenging, especially for a large number of users and correlated channels. In this paper, we propose a low computational complexity CG detection from a finite-precision perspective. First, we develop a finite-precision CG (FP-CG) detection to mitigate the computational bottleneck of each CG ite…
▽ More
The implementation of the conjugate gradient (CG) method for massive MIMO detection is computationally challenging, especially for a large number of users and correlated channels. In this paper, we propose a low computational complexity CG detection from a finite-precision perspective. First, we develop a finite-precision CG (FP-CG) detection to mitigate the computational bottleneck of each CG iteration and provide the attainable accuracy, convergence, and computational complexity analysis to reveal the impact of finite-precision arithmetic. A practical heuristic is presented to select suitable precisions. Then, to further reduce the number of iterations, we propose a joint finite-precision and block-Jacobi preconditioned CG (FP-BJ-CG) detection. The corresponding performance analysis is also provided. Finally, simulation results validate the theoretical insights and demonstrate the superiority of the proposed detection.
△ Less
Submitted 13 April, 2025;
originally announced April 2025.
-
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Authors:
Team Seawead,
Ceyuan Yang,
Zhijie Lin,
Yang Zhao,
Shanchuan Lin,
Zhibei Ma,
Haoyuan Guo,
Hao Chen,
Lu Qi,
Sen Wang,
Feng Cheng,
Feilong Zuo,
Xuejiao Zeng,
Ziyan Yang,
Fangyuan Kong,
Meng Wei,
Zhiwu Qing,
Fei Xiao,
Tuyen Hoang,
Siyu Zhang,
Peihao Zhu,
Qi Zhao,
Jiangqiao Yan,
Liangke Gui,
Sheng Bi
, et al. (30 additional authors not shown)
Abstract:
This technical report presents a cost-efficient strategy for training a video generation foundation model. We present a mid-sized research model with approximately 7 billion parameters (7B) called Seaweed-7B trained from scratch using 665,000 H100 GPU hours. Despite being trained with moderate computational resources, Seaweed-7B demonstrates highly competitive performance compared to contemporary…
▽ More
This technical report presents a cost-efficient strategy for training a video generation foundation model. We present a mid-sized research model with approximately 7 billion parameters (7B) called Seaweed-7B trained from scratch using 665,000 H100 GPU hours. Despite being trained with moderate computational resources, Seaweed-7B demonstrates highly competitive performance compared to contemporary video generation models of much larger size. Design choices are especially crucial in a resource-constrained setting. This technical report highlights the key design decisions that enhance the performance of the medium-sized diffusion model. Empirically, we make two observations: (1) Seaweed-7B achieves performance comparable to, or even surpasses, larger models trained on substantially greater GPU resources, and (2) our model, which exhibits strong generalization ability, can be effectively adapted across a wide range of downstream applications either by lightweight fine-tuning or continue training. See the project page at https://seaweed.video/
△ Less
Submitted 4 May, 2025; v1 submitted 11 April, 2025;
originally announced April 2025.
-
Exploration of Approaches for Robustness and Safety in a Low Code Open Environment for Factory Automation
Authors:
Gustavo Quiros A.,
Yi Peng Zhu,
Tao Cui,
Shaokai Lin,
Marten Lohstroh,
Edward A. Lee
Abstract:
This report is a compilation of technical knowledge and concepts that were produced by the authors and additional contributors in the context of the collaboration projects "Abstraction Requirements for Language of Choice in Industrial Automation" (FY21-22) and "Approaches for Robust and Safe Low-Code" (FY23-24) from Siemens Technology and the University of California, Berkeley. The primary objecti…
▽ More
This report is a compilation of technical knowledge and concepts that were produced by the authors and additional contributors in the context of the collaboration projects "Abstraction Requirements for Language of Choice in Industrial Automation" (FY21-22) and "Approaches for Robust and Safe Low-Code" (FY23-24) from Siemens Technology and the University of California, Berkeley. The primary objective of these projects was to assess Siemens Open Industrial Edge (OIE) engineering capabilities by defining a concept that ensures the satisfaction of coordination and safety requirements when using disparate OIE modules. The objective was to use the Lingua Franca (LF) coordination language to demonstrate how to address challenges in: 1. engineering modular, distributed, and flexible automation solutions that ensure, by design, robust and safe operation1; 2. the use of IEC 61499, the event driven execution model for specifying the execution order of OIE modules (defined as function blocks); 3. support large-scale distributed OIE automation solutions, and eventually 4. define optimal solutions with synchronization and time-optimal mechanisms.
△ Less
Submitted 5 April, 2025;
originally announced April 2025.
-
EOOD: Entropy-based Out-of-distribution Detection
Authors:
Guide Yang,
Chao Hou,
Weilong Peng,
Xiang Fang,
Yongwei Nie,
Peican Zhu,
Keke Tang
Abstract:
Deep neural networks (DNNs) often exhibit overconfidence when encountering out-of-distribution (OOD) samples, posing significant challenges for deployment. Since DNNs are trained on in-distribution (ID) datasets, the information flow of ID samples through DNNs inevitably differs from that of OOD samples. In this paper, we propose an Entropy-based Out-Of-distribution Detection (EOOD) framework. EOO…
▽ More
Deep neural networks (DNNs) often exhibit overconfidence when encountering out-of-distribution (OOD) samples, posing significant challenges for deployment. Since DNNs are trained on in-distribution (ID) datasets, the information flow of ID samples through DNNs inevitably differs from that of OOD samples. In this paper, we propose an Entropy-based Out-Of-distribution Detection (EOOD) framework. EOOD first identifies specific block where the information flow differences between ID and OOD samples are more pronounced, using both ID and pseudo-OOD samples. It then calculates the conditional entropy on the selected block as the OOD confidence score. Comprehensive experiments conducted across various ID and OOD settings demonstrate the effectiveness of EOOD in OOD detection and its superiority over state-of-the-art methods.
△ Less
Submitted 4 April, 2025;
originally announced April 2025.
-
A YOLO-Based Semi-Automated Labeling Approach to Improve Fault Detection Efficiency in Railroad Videos
Authors:
Dylan Lester,
James Gao,
Samuel Sutphin,
Pingping Zhu,
Husnu Narman,
Ammar Alzarrad
Abstract:
Manual labeling for large-scale image and video datasets is often time-intensive, error-prone, and costly, posing a significant barrier to efficient machine learning workflows in fault detection from railroad videos. This study introduces a semi-automated labeling method that utilizes a pre-trained You Only Look Once (YOLO) model to streamline the labeling process and enhance fault detection accur…
▽ More
Manual labeling for large-scale image and video datasets is often time-intensive, error-prone, and costly, posing a significant barrier to efficient machine learning workflows in fault detection from railroad videos. This study introduces a semi-automated labeling method that utilizes a pre-trained You Only Look Once (YOLO) model to streamline the labeling process and enhance fault detection accuracy in railroad videos. By initiating the process with a small set of manually labeled data, our approach iteratively trains the YOLO model, using each cycle's output to improve model accuracy and progressively reduce the need for human intervention.
To facilitate easy correction of model predictions, we developed a system to export YOLO's detection data as an editable text file, enabling rapid adjustments when detections require refinement. This approach decreases labeling time from an average of 2 to 4 minutes per image to 30 seconds to 2 minutes, effectively minimizing labor costs and labeling errors. Unlike costly AI based labeling solutions on paid platforms, our method provides a cost-effective alternative for researchers and practitioners handling large datasets in fault detection and other detection based machine learning applications.
△ Less
Submitted 1 April, 2025;
originally announced April 2025.
-
VTD-CLIP: Video-to-Text Discretization via Prompting CLIP
Authors:
Wencheng Zhu,
Yuexin Wang,
Hongxuan Li,
Pengfei Zhu,
Qinghua Hu
Abstract:
Vision-language models bridge visual and linguistic understanding and have proven to be powerful for video recognition tasks. Existing approaches primarily rely on parameter-efficient fine-tuning of image-text pre-trained models, yet they often suffer from limited interpretability and poor generalization due to inadequate temporal modeling. To address these, we propose a simple yet effective video…
▽ More
Vision-language models bridge visual and linguistic understanding and have proven to be powerful for video recognition tasks. Existing approaches primarily rely on parameter-efficient fine-tuning of image-text pre-trained models, yet they often suffer from limited interpretability and poor generalization due to inadequate temporal modeling. To address these, we propose a simple yet effective video-to-text discretization framework. Our method repurposes the frozen text encoder to construct a visual codebook from video class labels due to the many-to-one contrastive alignment between visual and textual embeddings in multimodal pretraining. This codebook effectively transforms temporal visual data into textual tokens via feature lookups and offers interpretable video representations through explicit video modeling. Then, to enhance robustness against irrelevant or noisy frames, we introduce a confidence-aware fusion module that dynamically weights keyframes by assessing their semantic relevance via the codebook. Furthermore, our method incorporates learnable text prompts to conduct adaptive codebook updates. Extensive experiments on HMDB-51, UCF-101, SSv2, and Kinetics-400 have validated the superiority of our approach, achieving more competitive improvements over state-of-the-art methods. The code will be publicly available at https://github.com/isxinxin/VTD-CLIP.
△ Less
Submitted 24 March, 2025; v1 submitted 24 March, 2025;
originally announced March 2025.
-
BackMix: Regularizing Open Set Recognition by Removing Underlying Fore-Background Priors
Authors:
Yu Wang,
Junxian Mu,
Hongzhi Huang,
Qilong Wang,
Pengfei Zhu,
Qinghua Hu
Abstract:
Open set recognition (OSR) requires models to classify known samples while detecting unknown samples for real-world applications. Existing studies show impressive progress using unknown samples from auxiliary datasets to regularize OSR models, but they have proved to be sensitive to selecting such known outliers. In this paper, we discuss the aforementioned problem from a new perspective: Can we r…
▽ More
Open set recognition (OSR) requires models to classify known samples while detecting unknown samples for real-world applications. Existing studies show impressive progress using unknown samples from auxiliary datasets to regularize OSR models, but they have proved to be sensitive to selecting such known outliers. In this paper, we discuss the aforementioned problem from a new perspective: Can we regularize OSR models without elaborately selecting auxiliary known outliers? We first empirically and theoretically explore the role of foregrounds and backgrounds in open set recognition and disclose that: 1) backgrounds that correlate with foregrounds would mislead the model and cause failures when encounters 'partially' known images; 2) Backgrounds unrelated to foregrounds can serve as auxiliary known outliers and provide regularization via global average pooling. Based on the above insights, we propose a new method, Background Mix (BackMix), that mixes the foreground of an image with different backgrounds to remove the underlying fore-background priors. Specifically, BackMix first estimates the foreground with class activation maps (CAMs), then randomly replaces image patches with backgrounds from other images to obtain mixed images for training. With backgrounds de-correlated from foregrounds, the open set recognition performance is significantly improved. The proposed method is quite simple to implement, requires no extra operation for inferences, and can be seamlessly integrated into almost all of the existing frameworks. The code is released on https://github.com/Vanixxz/BackMix.
△ Less
Submitted 22 March, 2025;
originally announced March 2025.
-
Dream-IF: Dynamic Relative EnhAnceMent for Image Fusion
Authors:
Xingxin Xu,
Bing Cao,
Yinan Xia,
Pengfei Zhu,
Qinghua Hu
Abstract:
Image fusion aims to integrate comprehensive information from images acquired through multiple sources. However, images captured by diverse sensors often encounter various degradations that can negatively affect fusion quality. Traditional fusion methods generally treat image enhancement and fusion as separate processes, overlooking the inherent correlation between them; notably, the dominant regi…
▽ More
Image fusion aims to integrate comprehensive information from images acquired through multiple sources. However, images captured by diverse sensors often encounter various degradations that can negatively affect fusion quality. Traditional fusion methods generally treat image enhancement and fusion as separate processes, overlooking the inherent correlation between them; notably, the dominant regions in one modality of a fused image often indicate areas where the other modality might benefit from enhancement. Inspired by this observation, we introduce the concept of dominant regions for image enhancement and present a Dynamic Relative EnhAnceMent framework for Image Fusion (Dream-IF). This framework quantifies the relative dominance of each modality across different layers and leverages this information to facilitate reciprocal cross-modal enhancement. By integrating the relative dominance derived from image fusion, our approach supports not only image restoration but also a broader range of image enhancement applications. Furthermore, we employ prompt-based encoding to capture degradation-specific details, which dynamically steer the restoration process and promote coordinated enhancement in both multi-modal image fusion and image enhancement scenarios. Extensive experimental results demonstrate that Dream-IF consistently outperforms its counterparts.
△ Less
Submitted 13 March, 2025;
originally announced March 2025.
-
NeRF-VIO: Map-Based Visual-Inertial Odometry with Initialization Leveraging Neural Radiance Fields
Authors:
Yanyu Zhang,
Dongming Wang,
Jie Xu,
Mengyuan Liu,
Pengxiang Zhu,
Wei Ren
Abstract:
A prior map serves as a foundational reference for localization in context-aware applications such as augmented reality (AR). Providing valuable contextual information about the environment, the prior map is a vital tool for mitigating drift. In this paper, we propose a map-based visual-inertial localization algorithm (NeRF-VIO) with initialization using neural radiance fields (NeRF). Our algorith…
▽ More
A prior map serves as a foundational reference for localization in context-aware applications such as augmented reality (AR). Providing valuable contextual information about the environment, the prior map is a vital tool for mitigating drift. In this paper, we propose a map-based visual-inertial localization algorithm (NeRF-VIO) with initialization using neural radiance fields (NeRF). Our algorithm utilizes a multilayer perceptron model and redefines the loss function as the geodesic distance on \(SE(3)\), ensuring the invariance of the initialization model under a frame change within \(\mathfrak{se}(3)\). The evaluation demonstrates that our model outperforms existing NeRF-based initialization solution in both accuracy and efficiency. By integrating a two-stage update mechanism within a multi-state constraint Kalman filter (MSCKF) framework, the state of NeRF-VIO is constrained by both captured images from an onboard camera and rendered images from a pre-trained NeRF model. The proposed algorithm is validated using a real-world AR dataset, the results indicate that our two-stage update pipeline outperforms MSCKF across all data sequences.
△ Less
Submitted 10 March, 2025;
originally announced March 2025.
-
SSR: A Swapping-Sweeping-and-Rewriting Optimizer for Quantum Circuit Transformation
Authors:
Yunqi Huang,
Xiangzhen Zhou,
Fanxu Meng,
Pengcheng Zhu,
Yu Luo,
Zhenlong Du
Abstract:
Quantum circuit transformation (QCT), necessary for adapting any quantum circuit to the qubit connectivity constraints of the NISQ device, often introduces numerous additional SWAP gates into the original circuit, increasing the circuit depth and thus reducing the success rate of computation. To minimize the depth of QCT circuits, we propose a Swapping-Sweeping-and-Rewriting optimizer. This optimi…
▽ More
Quantum circuit transformation (QCT), necessary for adapting any quantum circuit to the qubit connectivity constraints of the NISQ device, often introduces numerous additional SWAP gates into the original circuit, increasing the circuit depth and thus reducing the success rate of computation. To minimize the depth of QCT circuits, we propose a Swapping-Sweeping-and-Rewriting optimizer. This optimizer rearranges the circuit based on generalized gate commutation rules via a genetic algorithm, extracts subcircuits consisting of CNOT gates using a circuit sweeping technique, and rewrites each subcircuit with a functionally equivalent and depth-optimal circuit generated by an SAT solver. The devised optimizer effectively captures the intrinsic patterns of the QCT circuits, and the experimental results demonstrate that our algorithm can significantly reduce the depth of QCT circuits, 26.68\% at most and 12.18\% on average, across all benchmark circuits.
△ Less
Submitted 27 April, 2025; v1 submitted 5 March, 2025;
originally announced March 2025.
-
Gas outflows in two recently quenched galaxies at z = 4 and 7
Authors:
F. Valentino,
K. E. Heintz,
G. Brammer,
K. Ito,
V. Kokorev,
K. E. Whitaker,
A. Gallazzi,
A. de Graaff,
A. Weibel,
B. L. Frye,
P. S. Kamieneski,
S. Jin,
D. Ceverino,
A. Faisst,
M. Farcy,
S. Fujimoto,
S. Gillman,
R. Gottumukkala,
M. Hamadouche,
K. C. Harrington,
M. Hirschmann,
C. K. Jespersen,
T. Kakimoto,
M. Kubo,
C. d. P. Lagos
, et al. (11 additional authors not shown)
Abstract:
Outflows are a key element in the baryon cycle of galaxies, and their properties provide a fundamental test for our models of how star formation quenches in galaxies. Here we report the detection of outflowing gas in two recently quenched, massive ($M_\star\sim10^{10.2}M_\odot$) galaxies at z=4.106 (NS_274) and z=7.276 (RUBIES-UDS-QG-z7) observed with JWST/NIRSpec. The outflows are traced by blue-…
▽ More
Outflows are a key element in the baryon cycle of galaxies, and their properties provide a fundamental test for our models of how star formation quenches in galaxies. Here we report the detection of outflowing gas in two recently quenched, massive ($M_\star\sim10^{10.2}M_\odot$) galaxies at z=4.106 (NS_274) and z=7.276 (RUBIES-UDS-QG-z7) observed with JWST/NIRSpec. The outflows are traced by blue-shifted MgII absorption lines, and in the case of the z=4.1 system, also by FeII and NaI features. The spectra of the two sources are similar to those of local post-starburst galaxies, showing deep Balmer features and minimal star formation on 10 Myr timescales as traced by the lack of bright emission lines, also suggesting the absence of a strong and radiatively efficient AGN. The galaxies' SFHs are consistent with an abrupt quenching of star formation, which continued at rates of $\sim15\,M_\odot$/yr averaged over 100 Myr timescales. Dedicated millimeter observations of NS_274 constrain its dust obscured SFR to $<12\,M_\odot$/yr. Under simple geometrical assumptions, we derive mass loading factors $\lesssim1$ and $>10$ for the z=4.1 and z=7.3 systems, respectively, and similarly different energies carried by the outflows. Supernova feedback can account for the mass and energy of the outflow in NS_274. However, the low mass loading factor and average gas velocity suggest that the observed outflow is likely not the primary factor behind its quenching. SF-related processes seem to be insufficient to explain the extreme mass outflow rate of RUBIES-UDS-QG-z7, which would require an additional ejective mechanism such as an undetected AGN. Finally, the average outflow velocities per unit $M_\star$, SFR, or its surface area are consistent with those of lower-redshift post-starburst galaxies, suggesting that outflows in rapidly quenched galaxies might occur similarly across cosmic time. [Abridged]
△ Less
Submitted 3 July, 2025; v1 submitted 3 March, 2025;
originally announced March 2025.
-
A merging pair of massive quiescent galaxies at $z=3.44$ in the Cosmic Vine
Authors:
K. Ito,
F. Valentino,
M. Farcy,
G. De Lucia,
C. D. P. Lagos,
M. Hirschmann,
G. Brammer,
A. de Graaff,
D. Blánquez-Sesé,
D. Ceverino,
A. L. Faisst,
F. Fontanot,
S. Gillman,
M. L. Hamadouche,
K. E. Heintz,
S. Jin,
C. K. Jespersen,
M. Kubo,
M. Lee,
G. Magdis,
A. W. S. Man,
M. Onodera,
F. Rizzo,
R. Shimakawa,
M. Tanaka
, et al. (4 additional authors not shown)
Abstract:
We report the spectroscopic confirmation of a merging pair of massive quiescent galaxies at $z=3.44$. Using JWST observations, we confirm that the two galaxies lie at a projected separation of 4.5 kpc with a velocity offset of $\sim 680\, {\rm km\, s^{-1}}\ (δ_z \sim 0.01)$. The pair resides in the core of a known rich overdensity of galaxies, dubbed the "Cosmic Vine". For both pair members, model…
▽ More
We report the spectroscopic confirmation of a merging pair of massive quiescent galaxies at $z=3.44$. Using JWST observations, we confirm that the two galaxies lie at a projected separation of 4.5 kpc with a velocity offset of $\sim 680\, {\rm km\, s^{-1}}\ (δ_z \sim 0.01)$. The pair resides in the core of a known rich overdensity of galaxies, dubbed the "Cosmic Vine". For both pair members, modeling of the Spectral Energy Distributions and faint rest-frame optical emission lines indicate high stellar masses ($\log{(M_\star/M_\odot)}\sim10.9$) and suppressed star formation ($\log{\rm (sSFR/yr^{-1})}<-10$), more than an order of magnitude below the level of the star formation main sequence at this redshift. We then explore the Illustris-TNG simulation and the GAEA and SHARK semi-analytical models to examine whether they produce a pair of massive quiescent galaxies akin to that of the Cosmic Vine. While all models produce close pairs of massive quiescent galaxies at $2<z<4$ with comparable separations and velocity offsets, their predicted number densities are $10-80$ times lower than our observational constraint. This discrepancy cannot be fully explained by coarse time sampling in these models or the general challenge of forming early massive quiescent galaxies in simulations. Given that $>90\%$ of simulated pairs in the models that we analyzed merge by $z=0$, our findings suggest that our observed pair will likely coalesce into a single massive galaxy. The merger, occurring in the dense core of a large-scale structure, might represent a critical event in the formation of a brightest cluster galaxy and the morphological transformation of high-redshift disky quiescent galaxies into early-type ellipticals.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.