-
BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text
Authors:
Jiageng Wu,
Bowen Gu,
Ren Zhou,
Kevin Xie,
Doug Snyder,
Yixing Jiang,
Valentina Carducci,
Richard Wyss,
Rishi J Desai,
Emily Alsentzer,
Leo Anthony Celi,
Adam Rodman,
Sebastian Schneeweiss,
Jonathan H. Chen,
Santiago Romero-Brufau,
Kueiyu Joshua Lin,
Jie Yang
Abstract:
Large language models (LLMs) hold great promise for medical applications and are evolving rapidly, with new models being released at an accelerated pace. However, current evaluations of LLMs in clinical contexts remain limited. Most existing benchmarks rely on medical exam-style questions or PubMed-derived text, failing to capture the complexity of real-world electronic health record (EHR) data. O…
▽ More
Large language models (LLMs) hold great promise for medical applications and are evolving rapidly, with new models being released at an accelerated pace. However, current evaluations of LLMs in clinical contexts remain limited. Most existing benchmarks rely on medical exam-style questions or PubMed-derived text, failing to capture the complexity of real-world electronic health record (EHR) data. Others focus narrowly on specific application scenarios, limiting their generalizability across broader clinical use. To address this gap, we present BRIDGE, a comprehensive multilingual benchmark comprising 87 tasks sourced from real-world clinical data sources across nine languages. We systematically evaluated 52 state-of-the-art LLMs (including DeepSeek-R1, GPT-4o, Gemini, and Llama 4) under various inference strategies. With a total of 13,572 experiments, our results reveal substantial performance variation across model sizes, languages, natural language processing tasks, and clinical specialties. Notably, we demonstrate that open-source LLMs can achieve performance comparable to proprietary models, while medically fine-tuned LLMs based on older architectures often underperform versus updated general-purpose models. The BRIDGE and its corresponding leaderboard serve as a foundational resource and a unique reference for the development and evaluation of new LLMs in real-world clinical text understanding.
The BRIDGE leaderboard: https://huggingface.co/spaces/YLab-Open/BRIDGE-Medical-Leaderboard
△ Less
Submitted 30 April, 2025; v1 submitted 28 April, 2025;
originally announced April 2025.
-
Full analysis of CP violation induced by the decay angular correlations in four-body cascade decays of heavy hadrons
Authors:
Zhen-Hua Zhang,
Jian-Yu Yang,
Xin-Heng Guo
Abstract:
The violation of the charge-parity (CP) transformation symmetry, which although has been observed in plenty of pure meson decay processes, was only confirmed just very recently by the LHCb collaboration in the four-body decay of the heavy baryon $Λ_b^0$, $Λ_b^0\to p K^- π^+ π^-$, through a comparison of the decay branching ratio with that of the CP-conjugate process. However, the detailed dynamics…
▽ More
The violation of the charge-parity (CP) transformation symmetry, which although has been observed in plenty of pure meson decay processes, was only confirmed just very recently by the LHCb collaboration in the four-body decay of the heavy baryon $Λ_b^0$, $Λ_b^0\to p K^- π^+ π^-$, through a comparison of the decay branching ratio with that of the CP-conjugate process. However, the detailed dynamics behind this CP asymmetry is obviously far from clear. In this paper, we propose a formalism for the full analysis of the decay angular correlations in four-body cascade decays of heavy hadrons which can provide more information about the CP violation in these decays. To illustrate this, we apply the decay angular correlation analysis of CP violation to another four-body decay channel that involve baryons, $B^0\to p\bar{p}K^+π^-$, which has also been investigated by the LHCb collaboration with no evidence of CP violation being found. Surprisingly, with the event yield extracted inversely from the published data of LHCb, we obtain non-zero CP asymmetries of about $10\%$ corresponding to the decay angular correlations at larger than $5σ$ confidence level, which are considerably larger than the CPA asymmetries observed in the $Λ_b^0\to p K^- π^+ π^-$ channel, indicating that CP violation could have been observed in processes involving baryons much earlier if the full analysis of angular correlations had been performed. We suggest our experimental colleagues to perform full decay angular correlation analyses of CP violation in four-body decays of heavy hadrons, including the above two decay channels.
△ Less
Submitted 27 April, 2025;
originally announced April 2025.
-
Measurements of branching fractions of $D^0\to K^- 3π^+2π^-$, $D^0\to K^- 2π^+π^-2π^0$ and $D^+\to K^- 3π^+π^-π^0$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (693 additional authors not shown)
Abstract:
Utilizing $7.9\,\rm fb^{-1}$ of $e^+e^-$ collision data taken with the BESIII detector at the center-of-mass energy of 3.773 GeV, we report the measurements of absolute branching fractions of the hadronic decays $D^0\to K^- 3π^+2π^-$, $D^0\to K^- 2π^+π^-2π^0$ and $D^+\to K^- 3π^+π^-π^0$. The $D^0\to K^- 3π^+2π^-$ decay is measured with improved precision, while the latter two decays are observed w…
▽ More
Utilizing $7.9\,\rm fb^{-1}$ of $e^+e^-$ collision data taken with the BESIII detector at the center-of-mass energy of 3.773 GeV, we report the measurements of absolute branching fractions of the hadronic decays $D^0\to K^- 3π^+2π^-$, $D^0\to K^- 2π^+π^-2π^0$ and $D^+\to K^- 3π^+π^-π^0$. The $D^0\to K^- 3π^+2π^-$ decay is measured with improved precision, while the latter two decays are observed with statistical significance higher than $5σ$ for the first time. The absolute branching fractions of these decays are determined to be ${\mathcal B}(D^0\to K^- 3π^+2π^-)=( 1.35\pm 0.23\pm 0.08 )\times 10^{-4}$, ${\mathcal B}(D^0\to K^- 2π^+π^-2π^0)=( 19.0\pm 1.1\pm 1.5)\times 10^{-4}$, and ${\mathcal B}(D^+\to K^- 3π^+π^-π^0)=( 6.57\pm 0.69\pm 0.33)\times 10^{-4}$, where the first uncertainties are statistical and the second systematic.
△ Less
Submitted 27 April, 2025;
originally announced April 2025.
-
MLICv2: Enhanced Multi-Reference Entropy Modeling for Learned Image Compression
Authors:
Wei Jiang,
Yongqi Zhai,
Jiayu Yang,
Feng Gao,
Ronggang Wang
Abstract:
Recent advancements in learned image compression (LIC) have yielded impressive performance gains. Notably, the learned image compression models with multi-reference entropy models (MLIC series) have significantly outperformed existing traditional image codecs such as the Versatile Video Coding (VVC) Intra. In this paper, we present MLICv2 and MLICv2$^+$, enhanced versions of the MLIC series, featu…
▽ More
Recent advancements in learned image compression (LIC) have yielded impressive performance gains. Notably, the learned image compression models with multi-reference entropy models (MLIC series) have significantly outperformed existing traditional image codecs such as the Versatile Video Coding (VVC) Intra. In this paper, we present MLICv2 and MLICv2$^+$, enhanced versions of the MLIC series, featuring improved transform techniques, entropy modeling, and instance adaptability. For better transform, we introduce a simple token mixing transform block inspired by the meta transformer architecture, addressing the performance degradation at high bit-rates observed in previous MLIC series while maintaining computational efficiency. To enhance entropy modeling, we propose a hyperprior-guided global correlation prediction, enabling the capture of global contexts in the initial slice of the latent representation. We also develop a channel reweighting module to dynamically prioritize important channels within each context. Additionally, advanced positional embedding for context modeling and selective compression with guided optimization are investigated. To boost instance adaptability, we employ stochastic Gumbel annealing to iteratively refine the latent representation according to the rate-distortion optimization of a specific input image. This approach further enhances performance without impacting decoding speed. Experimental results demonstrate that our MLICv2 and MLICv2$^+$ achieve state-of-the-art performance, reducing Bjontegaard-Delta rate (BD-rate) by 16.54%, 21.61%, 16.05% and 20.46%, 24.35%, 19.14% respectively, compared to VTM-17.0 Intra on the Kodak, Tecnick, CLIC Pro Val dataset, respectively.
△ Less
Submitted 27 April, 2025;
originally announced April 2025.
-
A Multi-Language Perspective on the Robustness of LLM Code Generation
Authors:
Fazle Rabbi,
Zishuo Ding,
Jinqiu Yang
Abstract:
Large language models have gained significant traction and popularity in recent times, extending their usage to code-generation tasks. While this field has garnered considerable attention, the exploration of testing and evaluating the robustness of code generation models remains an ongoing endeavor. Previous studies have primarily focused on code generation models specifically for the Python langu…
▽ More
Large language models have gained significant traction and popularity in recent times, extending their usage to code-generation tasks. While this field has garnered considerable attention, the exploration of testing and evaluating the robustness of code generation models remains an ongoing endeavor. Previous studies have primarily focused on code generation models specifically for the Python language, overlooking other widely used programming languages. In this research, we conduct a comprehensive comparative analysis to assess the robustness performance of several prominent code generation models. Furthermore, we investigate how their performance varies across different programming languages. To accomplish this, we introduce perturbations in four key areas of the prompt: DocString, function name, syntax, and format. We have compiled and released a dedicated dataset for this purpose. This work presents our experimental findings, shedding light on the performance of code generation models in various scenarios.
△ Less
Submitted 1 May, 2025; v1 submitted 27 April, 2025;
originally announced April 2025.
-
Search for $η_{1}(1855)$ in $χ_{cJ}\toηηη^{\prime}$ decays
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (697 additional authors not shown)
Abstract:
Based on a sample of $2.7\times10^{9}$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII collider, an analysis of the decay $ψ(3686)\toγχ_{cJ}, χ_{cJ}\toηηη^{\prime}$ is performed. The decay modes $χ_{c1}$ and $χ_{c2}\toηηη^{\prime}$ are observed for the first time, and their corresponding branching fractions are determined to be…
▽ More
Based on a sample of $2.7\times10^{9}$ $ψ(3686)$ events collected by the BESIII detector operating at the BEPCII collider, an analysis of the decay $ψ(3686)\toγχ_{cJ}, χ_{cJ}\toηηη^{\prime}$ is performed. The decay modes $χ_{c1}$ and $χ_{c2}\toηηη^{\prime}$ are observed for the first time, and their corresponding branching fractions are determined to be $\mathcal{B}(χ_{c1}\toηηη^{\prime}) = (1.39 \pm 0.13(\text{stat.}) \pm 0.09(\text{sys.})) \times 10^{-4}$ and $\mathcal{B}(χ_{c2}\toηηη^{\prime}) = (4.42 \pm 0.86(\text{stat.}) \pm 0.37(\text{sys.})) \times 10^{-5}$. An upper limit on the branching fraction of $χ_{c0}\toηηη^{\prime}$ is set as $2.64 \times 10^{-5}$ at 90\% confidence level (CL). A partial wave analysis (PWA) of the decay $χ_{c1}\toηηη^{\prime}$ is performed to search for the $1^{-+}$ exotic state $η_1(1855)$. The PWA result indicates that the structure in the $ηη^{\prime}$ mass spectrum is mainly attributed to the $f_0(1500)$, while in the $ηη$ mass spectrum, it is primarily the $0^{++}$ phase space. The upper limit of $\mathcal{B}(χ_{c1}\toη_{1}(1855)η) \cdot \mathcal{B}(η_{1}(1855)\toηη^{\prime})< 9.79 \times 10^{-5}$ is set based on the PWA at 90\% CL.
△ Less
Submitted 26 April, 2025;
originally announced April 2025.
-
Boosting Single-domain Generalized Object Detection via Vision-Language Knowledge Interaction
Authors:
Xiaoran Xu,
Jiangang Yang,
Wenyue Chong,
Wenhui Shi,
Shichu Sun,
Jing Xing,
Jian Liu
Abstract:
Single-Domain Generalized Object Detection~(S-DGOD) aims to train an object detector on a single source domain while generalizing well to diverse unseen target domains, making it suitable for multimedia applications that involve various domain shifts, such as intelligent video surveillance and VR/AR technologies. With the success of large-scale Vision-Language Models, recent S-DGOD approaches expl…
▽ More
Single-Domain Generalized Object Detection~(S-DGOD) aims to train an object detector on a single source domain while generalizing well to diverse unseen target domains, making it suitable for multimedia applications that involve various domain shifts, such as intelligent video surveillance and VR/AR technologies. With the success of large-scale Vision-Language Models, recent S-DGOD approaches exploit pre-trained vision-language knowledge to guide invariant feature learning across visual domains. However, the utilized knowledge remains at a coarse-grained level~(e.g., the textual description of adverse weather paired with the image) and serves as an implicit regularization for guidance, struggling to learn accurate region- and object-level features in varying domains. In this work, we propose a new cross-modal feature learning method, which can capture generalized and discriminative regional features for S-DGOD tasks. The core of our method is the mechanism of Cross-modal and Region-aware Feature Interaction, which simultaneously learns both inter-modal and intra-modal regional invariance through dynamic interactions between fine-grained textual and visual features. Moreover, we design a simple but effective strategy called Cross-domain Proposal Refining and Mixing, which aligns the position of region proposals across multiple domains and diversifies them, enhancing the localization ability of detectors in unseen scenarios. Our method achieves new state-of-the-art results on S-DGOD benchmark datasets, with improvements of +8.8\%~mPC on Cityscapes-C and +7.9\%~mPC on DWD over baselines, demonstrating its efficacy.
△ Less
Submitted 26 April, 2025;
originally announced April 2025.
-
Entrywise Approximate Matrix Inversion
Authors:
Mehrdad Ghadiri,
Junzhao Yang
Abstract:
We study the bit complexity of inverting diagonally dominant matrices, which are associated with random walk quantities such as hitting times and escape probabilities. Such quantities can be exponentially small, even on undirected unit-weighted graphs. However, their nonnegativity suggests that they can be approximated entrywise, leading to a stronger notion of approximation than vector norm-based…
▽ More
We study the bit complexity of inverting diagonally dominant matrices, which are associated with random walk quantities such as hitting times and escape probabilities. Such quantities can be exponentially small, even on undirected unit-weighted graphs. However, their nonnegativity suggests that they can be approximated entrywise, leading to a stronger notion of approximation than vector norm-based error.
Under this notion of error, existing Laplacian solvers and fast matrix multiplication approaches have bit complexities of $mn^2$ and $n^{ω+1}$, respectively, where $m$ is the number of nonzero entries in the matrix, $n$ is its size, and $ω$ is the matrix multiplication exponent.
We present algorithms that compute entrywise $\exp(ε)$-approximate inverses of row diagonally dominant $L$-matrices (RDDL) in two settings: (1) when the matrix entries are given in floating-point representation; (2) when they are given in fixed-point representation.
For floating-point inputs, we present a cubic-time algorithm and show that it has an optimal running time under the all-pairs shortest paths (APSP) conjecture.
For fixed-point inputs, we present several algorithms for solving linear systems and inverting RDDL and SDDM matrices, all with high probability.
Omitting logarithmic factors:
(1) For SDDM matrices, we provide an algorithm for solving a linear system with entrywise approximation guarantees using $\tilde{O}(m\sqrt{n})$ bit operations, and another for computing an entrywise approximate inverse using $\tilde{O}(mn)$ bit operations.
(2) For RDDL matrices, we present an algorithm for solving a linear system using $\tilde{O}(mn^{1+o(1)})$ bit operations, and two algorithms for computing an entrywise approximate inverse: one using $\tilde{O}(n^{ω+0.5})$ bit operations, and the other using $\tilde{O}(mn^{1.5+o(1)})$ bit operations.
△ Less
Submitted 26 April, 2025;
originally announced April 2025.
-
Insights on Metal Enrichment and Environmental Effect at $z\approx5-7$ with JWST ASPIRE/EIGER and Chemical Evolution Model
Authors:
Zihao Li,
Koki Kakiichi,
Lise Christensen,
Zheng Cai,
Avishai Dekel,
Xiaohui Fan,
Emanuele Paolo Farina,
Hyunsung D. Jun,
Zhaozhou Li,
Mingyu Li,
Maria Pudoka,
Fengwu Sun,
Maxime Trebitsch,
Fabian Walter,
Feige Wang,
Jinyi Yang,
Huanian Zhang,
Siwei Zou
Abstract:
We present the mass-metallicity relation (MZR) for a parent sample of 604 galaxies at $z=5.34-6.94$ with [\text{O}~\textsc{iii}] doublets detected, using the deep JWST/NIRCam wide field slitless spectroscopic (WFSS) observations in 26 quasar fields. The sample incorporates the full observations of 25 quasar fields from JWST Cycle 1 GO program ASPIRE and the quasar SDSS J0100+2802 from JWST EIGER p…
▽ More
We present the mass-metallicity relation (MZR) for a parent sample of 604 galaxies at $z=5.34-6.94$ with [\text{O}~\textsc{iii}] doublets detected, using the deep JWST/NIRCam wide field slitless spectroscopic (WFSS) observations in 26 quasar fields. The sample incorporates the full observations of 25 quasar fields from JWST Cycle 1 GO program ASPIRE and the quasar SDSS J0100+2802 from JWST EIGER program. We identify 204 galaxies residing in overdense structures using friends-of-friends (FoF) algorithm. We estimate the electron temperature of $2.0^{+0.3}_{-0.4}\times10^4$ K from the Hg and $[\text{O}~\textsc{iii}]_{4363}$ lines in the stacked spectrum, indicating a metal-poor sample with median gas phase metallicity 12+$\log(\mathrm{O/H})=7.64^{+0.23}_{-0.11}$. With the most up-to-date strong line calibration based on NIRSpec observations, we find that the MZR shows a metal enhancement of $\sim0.2$ dex at high mass end in overdense environments. However, compared to the local Fundamental Metallicity Relation (FMR), our galaxy sample at $z>5$ shows a metal deficiency of $\sim0.2$ dex relative to FMR predictions. We explain the observed trend of FMR with a simple analytical model, and we favor dilution from intense gas accretion over outflow to explain the metallicity properties at $z>5$. Those high redshift galaxies are likely in a rapid gas accretion phase when their metal and gas contents are in a non-equilibrium state. According to model predictions, the protocluster members are closer to the gas equilibrium state than field galaxies and thus have higher metallicity and are closer to the local FMR. Our results suggest that the accelerated star formation during protocluster assembly likely plays a key role in shaping the observed MZR and FMR, indicating a potentially earlier onset of metal enrichment in overdense environments at $z\approx5-7$.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
Learning Operators by Regularized Stochastic Gradient Descent with Operator-valued Kernels
Authors:
Jia-Qi Yang,
Lei Shi
Abstract:
This paper investigates regularized stochastic gradient descent (SGD) algorithms for estimating nonlinear operators from a Polish space to a separable Hilbert space. We assume that the regression operator lies in a vector-valued reproducing kernel Hilbert space induced by an operator-valued kernel. Two significant settings are considered: an online setting with polynomially decaying step sizes and…
▽ More
This paper investigates regularized stochastic gradient descent (SGD) algorithms for estimating nonlinear operators from a Polish space to a separable Hilbert space. We assume that the regression operator lies in a vector-valued reproducing kernel Hilbert space induced by an operator-valued kernel. Two significant settings are considered: an online setting with polynomially decaying step sizes and regularization parameters, and a finite-horizon setting with constant step sizes and regularization parameters. We introduce regularity conditions on the structure and smoothness of the target operator and the input random variables. Under these conditions, we provide a dimension-free convergence analysis for the prediction and estimation errors, deriving both expectation and high-probability error bounds. Our analysis demonstrates that these convergence rates are nearly optimal. Furthermore, we present a new technique for deriving bounds with high probability for general SGD schemes, which also ensures almost-sure convergence. Finally, we discuss potential extensions to more general operator-valued kernels and the encoder-decoder framework.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
Salient Region-Guided Spacecraft Image Arbitrary-Scale Super-Resolution Network
Authors:
Jingfan Yang,
Hu Gao,
Ying Zhang,
Depeng Dang
Abstract:
Spacecraft image super-resolution seeks to enhance low-resolution spacecraft images into high-resolution ones. Although existing arbitrary-scale super-resolution methods perform well on general images, they tend to overlook the difference in features between the spacecraft core region and the large black space background, introducing irrelevant noise. In this paper, we propose a salient region-gui…
▽ More
Spacecraft image super-resolution seeks to enhance low-resolution spacecraft images into high-resolution ones. Although existing arbitrary-scale super-resolution methods perform well on general images, they tend to overlook the difference in features between the spacecraft core region and the large black space background, introducing irrelevant noise. In this paper, we propose a salient region-guided spacecraft image arbitrary-scale super-resolution network (SGSASR), which uses features from the spacecraft core salient regions to guide latent modulation and achieve arbitrary-scale super-resolution. Specifically, we design a spacecraft core region recognition block (SCRRB) that identifies the core salient regions in spacecraft images using a pre-trained saliency detection model. Furthermore, we present an adaptive-weighted feature fusion enhancement mechanism (AFFEM) to selectively aggregate the spacecraft core region features with general image features by dynamic weight parameter to enhance the response of the core salient regions. Experimental results demonstrate that the proposed SGSASR outperforms state-of-the-art approaches.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
Automating Function-Level TARA for Automotive Full-Lifecycle Security
Authors:
Yuqiao Yang,
Yongzhao Zhang,
Wenhao Liu,
Jun Li,
Pengtao Shi,
DingYu Zhong,
Jie Yang,
Ting Chen,
Sheng Cao,
Yuntao Ren,
Yongyue Wu,
Xiaosong Zhang
Abstract:
As modern vehicles evolve into intelligent and connected systems, their growing complexity introduces significant cybersecurity risks. Threat Analysis and Risk Assessment (TARA) has therefore become essential for managing these risks under mandatory regulations. However, existing TARA automation methods rely on static threat libraries, limiting their utility in the detailed, function-level analyse…
▽ More
As modern vehicles evolve into intelligent and connected systems, their growing complexity introduces significant cybersecurity risks. Threat Analysis and Risk Assessment (TARA) has therefore become essential for managing these risks under mandatory regulations. However, existing TARA automation methods rely on static threat libraries, limiting their utility in the detailed, function-level analyses demanded by industry. This paper introduces DefenseWeaver, the first system that automates function-level TARA using component-specific details and large language models (LLMs). DefenseWeaver dynamically generates attack trees and risk evaluations from system configurations described in an extended OpenXSAM++ format, then employs a multi-agent framework to coordinate specialized LLM roles for more robust analysis. To further adapt to evolving threats and diverse standards, DefenseWeaver incorporates Low-Rank Adaptation (LoRA) fine-tuning and Retrieval-Augmented Generation (RAG) with expert-curated TARA reports. We validated DefenseWeaver through deployment in four automotive security projects, where it identified 11 critical attack paths, verified through penetration testing, and subsequently reported and remediated by the relevant automakers and suppliers. Additionally, DefenseWeaver demonstrated cross-domain adaptability, successfully applying to unmanned aerial vehicles (UAVs) and marine navigation systems. In comparison to human experts, DefenseWeaver outperformed manual attack tree generation across six assessment scenarios. Integrated into commercial cybersecurity platforms such as UAES and Xiaomi, DefenseWeaver has generated over 8,200 attack trees. These results highlight its ability to significantly reduce processing time, and its scalability and transformative impact on cybersecurity across industries.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
Robust Poling and Frequency Conversion on Thin-Film Periodically Poled Lithium Tantalate
Authors:
Anna Shelton,
C. J. Xin,
Keith Powell,
Jiayu Yang,
Shengyuan Lu,
Neil Sinclair,
Marko Loncar
Abstract:
We explore a robust fabrication process for periodically-poled thin-film lithium tantalate (PP-TFLT) by systematically varying fabrication parameters and confirming the quality of inverted domains with second-harmonic microscopy (SHM). We find a periodic poling recipe that can be applied to both acoustic-grade and optical-grade film, electrode material, and presence of an oxide interlayer. By usin…
▽ More
We explore a robust fabrication process for periodically-poled thin-film lithium tantalate (PP-TFLT) by systematically varying fabrication parameters and confirming the quality of inverted domains with second-harmonic microscopy (SHM). We find a periodic poling recipe that can be applied to both acoustic-grade and optical-grade film, electrode material, and presence of an oxide interlayer. By using a single high-voltage electrical pulse with peak voltage time of 10 ms or less and a ramp-down time of 90 s, rectangular poling domains are established and stabilized in the PP-TFLT. We employ our robust periodic poling process in a controllable pole-after-etch approach to produce PP-TFLT ridge waveguides with normalized second harmonic generation (SHG) conversion efficiencies of 208 %W-1cm-2 from 1550 nm to 775 nm in line with the theoretical value of 244 %W-1cm-2. This work establishes a high-performance poling process and demonstrates telecommunications band SHG for thin-film lithium tantalate, expanding the capabilities of the platform for frequency mixing applications in quantum photonics, sensing, and spectroscopy.
△ Less
Submitted 24 April, 2025;
originally announced April 2025.
-
OmniSage: Large Scale, Multi-Entity Heterogeneous Graph Representation Learning
Authors:
Anirudhan Badrinath,
Alex Yang,
Kousik Rajesh,
Prabhat Agarwal,
Jaewon Yang,
Haoyu Chen,
Jiajing Xu,
Charles Rosenberg
Abstract:
Representation learning, a task of learning latent vectors to represent entities, is a key task in improving search and recommender systems in web applications. Various representation learning methods have been developed, including graph-based approaches for relationships among entities, sequence-based methods for capturing the temporal evolution of user activities, and content-based models for le…
▽ More
Representation learning, a task of learning latent vectors to represent entities, is a key task in improving search and recommender systems in web applications. Various representation learning methods have been developed, including graph-based approaches for relationships among entities, sequence-based methods for capturing the temporal evolution of user activities, and content-based models for leveraging text and visual content. However, the development of a unifying framework that integrates these diverse techniques to support multiple applications remains a significant challenge. This paper presents OmniSage, a large-scale representation framework that learns universal representations for a variety of applications at Pinterest. OmniSage integrates graph neural networks with content-based models and user sequence models by employing multiple contrastive learning tasks to effectively process graph data, user sequence data, and content signals. To support the training and inference of OmniSage, we developed an efficient infrastructure capable of supporting Pinterest graphs with billions of nodes. The universal representations generated by OmniSage have significantly enhanced user experiences on Pinterest, leading to an approximate 2.5% increase in sitewide repins (saves) across five applications. This paper highlights the impact of unifying representation learning methods, and we will open source the OmniSage code by the time of publication.
△ Less
Submitted 1 May, 2025; v1 submitted 22 April, 2025;
originally announced April 2025.
-
SDVPT: Semantic-Driven Visual Prompt Tuning for Open-World Object Counting
Authors:
Yiming Zhao,
Guorong Li,
Laiyun Qing,
Amin Beheshti,
Jian Yang,
Michael Sheng,
Yuankai Qi,
Qingming Huang
Abstract:
Open-world object counting leverages the robust text-image alignment of pre-trained vision-language models (VLMs) to enable counting of arbitrary categories in images specified by textual queries. However, widely adopted naive fine-tuning strategies concentrate exclusively on text-image consistency for categories contained in training, which leads to limited generalizability for unseen categories.…
▽ More
Open-world object counting leverages the robust text-image alignment of pre-trained vision-language models (VLMs) to enable counting of arbitrary categories in images specified by textual queries. However, widely adopted naive fine-tuning strategies concentrate exclusively on text-image consistency for categories contained in training, which leads to limited generalizability for unseen categories. In this work, we propose a plug-and-play Semantic-Driven Visual Prompt Tuning framework (SDVPT) that transfers knowledge from the training set to unseen categories with minimal overhead in parameters and inference time. First, we introduce a two-stage visual prompt learning strategy composed of Category-Specific Prompt Initialization (CSPI) and Topology-Guided Prompt Refinement (TGPR). The CSPI generates category-specific visual prompts, and then TGPR distills latent structural patterns from the VLM's text encoder to refine these prompts. During inference, we dynamically synthesize the visual prompts for unseen categories based on the semantic correlation between unseen and training categories, facilitating robust text-image alignment for unseen categories. Extensive experiments integrating SDVPT with all available open-world object counting models demonstrate its effectiveness and adaptability across three widely used datasets: FSC-147, CARPK, and PUCPR+.
△ Less
Submitted 24 April, 2025;
originally announced April 2025.
-
Precision Neural Network Quantization via Learnable Adaptive Modules
Authors:
Wenqiang Zhou,
Zhendong Yu,
Xinyu Liu,
Jiaming Yang,
Rong Xiao,
Tao Wang,
Chenwei Tang,
Jiancheng Lv
Abstract:
Quantization Aware Training (QAT) is a neural network quantization technique that compresses model size and improves operational efficiency while effectively maintaining model performance. The paradigm of QAT is to introduce fake quantization operators during the training process, allowing the model to autonomously compensate for information loss caused by quantization. Making quantization paramet…
▽ More
Quantization Aware Training (QAT) is a neural network quantization technique that compresses model size and improves operational efficiency while effectively maintaining model performance. The paradigm of QAT is to introduce fake quantization operators during the training process, allowing the model to autonomously compensate for information loss caused by quantization. Making quantization parameters trainable can significantly improve the performance of QAT, but at the cost of compromising the flexibility during inference, especially when dealing with activation values with substantially different distributions. In this paper, we propose an effective learnable adaptive neural network quantization method, called Adaptive Step Size Quantization (ASQ), to resolve this conflict. Specifically, the proposed ASQ method first dynamically adjusts quantization scaling factors through a trained module capable of accommodating different activations. Then, to address the rigid resolution issue inherent in Power of Two (POT) quantization, we propose an efficient non-uniform quantization scheme. We utilize the Power Of Square root of Two (POST) as the basis for exponential quantization, effectively handling the bell-shaped distribution of neural network weights across various bit-widths while maintaining computational efficiency through a Look-Up Table method (LUT). Extensive experimental results demonstrate that the proposed ASQ method is superior to the state-of-the-art QAT approaches. Notably that the ASQ is even competitive compared to full precision baselines, with its 4-bit quantized ResNet34 model improving accuracy by 1.2\% on ImageNet.
△ Less
Submitted 24 April, 2025;
originally announced April 2025.
-
Does Knowledge Distillation Matter for Large Language Model based Bundle Generation?
Authors:
Kaidong Feng,
Zhu Sun,
Jie Yang,
Hui Fang,
Xinghua Qu,
Wenyuan Liu
Abstract:
LLMs are increasingly explored for bundle generation, thanks to their reasoning capabilities and knowledge. However, deploying large-scale LLMs introduces significant efficiency challenges, primarily high computational costs during fine-tuning and inference due to their massive parameterization. Knowledge distillation (KD) offers a promising solution, transferring expertise from large teacher mode…
▽ More
LLMs are increasingly explored for bundle generation, thanks to their reasoning capabilities and knowledge. However, deploying large-scale LLMs introduces significant efficiency challenges, primarily high computational costs during fine-tuning and inference due to their massive parameterization. Knowledge distillation (KD) offers a promising solution, transferring expertise from large teacher models to compact student models. This study systematically investigates knowledge distillation approaches for bundle generation, aiming to minimize computational demands while preserving performance. We explore three critical research questions: (1) how does the format of KD impact bundle generation performance? (2) to what extent does the quantity of distilled knowledge influence performance? and (3) how do different ways of utilizing the distilled knowledge affect performance? We propose a comprehensive KD framework that (i) progressively extracts knowledge (patterns, rules, deep thoughts); (ii) captures varying quantities of distilled knowledge through different strategies; and (iii) exploits complementary LLM adaptation techniques (in-context learning, supervised fine-tuning, combination) to leverage distilled knowledge in small student models for domain-specific adaptation and enhanced efficiency. Extensive experiments provide valuable insights into how knowledge format, quantity, and utilization methodologies collectively shape LLM-based bundle generation performance, exhibiting KD's significant potential for more efficient yet effective LLM-based bundle generation.
△ Less
Submitted 23 April, 2025;
originally announced April 2025.
-
Unveiling Solitonic Collisions in Mechanical Metamaterials
Authors:
Yasuhiro Miyazawa,
Christopher Chong,
Panayotis G. Kevrekidis,
Jinkyu Yang
Abstract:
Interactions between solitary waves have been pivotal to understanding nonlinear phenomena across various disciplines. The dynamics of rarefaction solitary waves holds great potential, yet their fundamental characteristics and interactions remain only partially understood through experimental means in mechanical metamaterials. Previous studies highlighted their existence and proposed applications,…
▽ More
Interactions between solitary waves have been pivotal to understanding nonlinear phenomena across various disciplines. The dynamics of rarefaction solitary waves holds great potential, yet their fundamental characteristics and interactions remain only partially understood through experimental means in mechanical metamaterials. Previous studies highlighted their existence and proposed applications, such as waveguides, impact mitigation, and energy harvesting. Challenges, including energy dissipation and a lack of precise measurement techniques, have hindered deeper exploration, most notably of solitonic collisions. In this work, we provide a definitive platform for examining pure rarefaction solitons propagating through a strain-softening mechanical lattice, addressing these challenges. Employing a theoretical framework based on the Boussinesq approximation and multiple-scale analysis, we predict soliton behavior, including phase shifts resulting from head-on collisions. These theoretical insights are corroborated through numerical simulations and systematic experiments designed to generate and measure pure rarefaction solitons with high precision. Both symmetric and asymmetric collisions are examined, revealing practically elastic interaction behaviors and amplitude-dependent phase shifts. Furthermore, collision dynamics, such as speed and phase shifts during rarefaction soliton collisions, from the experimental results show agreement with theoretical and numerical models. These results validate our experimental platform and findings, underscoring the potential of mechanical rarefaction solitons as robust, controllable wave packets. This suggests a robust paradigm for exploring nonlinear wave interactions in mechanical systems, opening new application avenues in mechanical metamaterials, such as wave-based computing and advanced signal processing.
△ Less
Submitted 20 May, 2025; v1 submitted 23 April, 2025;
originally announced April 2025.
-
Enhancing LLM-Based Agents via Global Planning and Hierarchical Execution
Authors:
Junjie Chen,
Haitao Li,
Jingli Yang,
Yiqun Liu,
Qingyao Ai
Abstract:
Intelligent agent systems based on Large Language Models (LLMs) have shown great potential in real-world applications. However, existing agent frameworks still face critical limitations in task planning and execution, restricting their effectiveness and generalizability. Specifically, current planning methods often lack clear global goals, leading agents to get stuck in local branches, or produce…
▽ More
Intelligent agent systems based on Large Language Models (LLMs) have shown great potential in real-world applications. However, existing agent frameworks still face critical limitations in task planning and execution, restricting their effectiveness and generalizability. Specifically, current planning methods often lack clear global goals, leading agents to get stuck in local branches, or produce non-executable plans. Meanwhile, existing execution mechanisms struggle to balance complexity and stability, and their limited action space restricts their ability to handle diverse real-world tasks. To address these limitations, we propose GoalAct, a novel agent framework that introduces a continuously updated global planning mechanism and integrates a hierarchical execution strategy. GoalAct decomposes task execution into high-level skills, including searching, coding, writing and more, thereby reducing planning complexity while enhancing the agents' adaptability across diverse task scenarios. We evaluate GoalAct on LegalAgentBench, a benchmark with multiple types of legal tasks that require the use of multiple types of tools. Experimental results demonstrate that GoalAct achieves state-of-the-art (SOTA) performance, with an average improvement of 12.22% in success rate. These findings highlight GoalAct's potential to drive the development of more advanced intelligent agent systems, making them more effective across complex real-world applications. Our code can be found at https://github.com/cjj826/GoalAct.
△ Less
Submitted 29 April, 2025; v1 submitted 23 April, 2025;
originally announced April 2025.
-
Automated Static Vulnerability Detection via a Holistic Neuro-symbolic Approach
Authors:
Penghui Li,
Songchen Yao,
Josef Sarfati Korich,
Changhua Luo,
Jianjia Yu,
Yinzhi Cao,
Junfeng Yang
Abstract:
Static vulnerability detection is still a challenging problem and demands excessive human efforts, e.g., manual curation of good vulnerability patterns. None of prior works, including classic program analysis or Large Language Model (LLM)-based approaches, have fully automated such vulnerability pattern generations with reasonable detection accuracy. In this paper, we design and implement, MoCQ, a…
▽ More
Static vulnerability detection is still a challenging problem and demands excessive human efforts, e.g., manual curation of good vulnerability patterns. None of prior works, including classic program analysis or Large Language Model (LLM)-based approaches, have fully automated such vulnerability pattern generations with reasonable detection accuracy. In this paper, we design and implement, MoCQ, a novel holistic neuro-symbolic framework that combines the complementary strengths of LLMs and classical static analysis to enable scalable vulnerability detection. The key insight is that MoCQ leverages an LLM to automatically extract vulnerability patterns and translate them into detection queries, and then on static analysis to refine such queries in a feedback loop and eventually execute them for analyzing large codebases and mining vulnerabilities. We evaluate MoCQ on seven types of vulnerabilities spanning two programming languages. We found MoCQ-generated queries uncovered at least 12 patterns that were missed by experts. On a ground truth dataset, MoCQ achieved comparable precision and recall compared to expert-crafted queries. Moreover, MoCQ has identified seven previously unknown vulnerabilities in real-world applications, demonstrating its practical effectiveness. We have responsibly disclosed them to the corresponding developers.
△ Less
Submitted 23 April, 2025; v1 submitted 22 April, 2025;
originally announced April 2025.
-
DSDNet: Raw Domain Demoiréing via Dual Color-Space Synergy
Authors:
Qirui Yang,
Fangpu Zhang,
Yeying Jin,
Qihua Cheng,
Pengtao Jiang,
Huanjing Yue,
Jingyu Yang
Abstract:
With the rapid advancement of mobile imaging, capturing screens using smartphones has become a prevalent practice in distance learning and conference recording. However, moiré artifacts, caused by frequency aliasing between display screens and camera sensors, are further amplified by the image signal processing pipeline, leading to severe visual degradation. Existing sRGB domain demoiréing methods…
▽ More
With the rapid advancement of mobile imaging, capturing screens using smartphones has become a prevalent practice in distance learning and conference recording. However, moiré artifacts, caused by frequency aliasing between display screens and camera sensors, are further amplified by the image signal processing pipeline, leading to severe visual degradation. Existing sRGB domain demoiréing methods struggle with irreversible information loss, while recent two-stage raw domain approaches suffer from information bottlenecks and inference inefficiency. To address these limitations, we propose a single-stage raw domain demoiréing framework, Dual-Stream Demoiréing Network (DSDNet), which leverages the synergy of raw and YCbCr images to remove moiré while preserving luminance and color fidelity. Specifically, to guide luminance correction and moiré removal, we design a raw-to-YCbCr mapping pipeline and introduce the Synergic Attention with Dynamic Modulation (SADM) module. This module enriches the raw-to-sRGB conversion with cross-domain contextual features. Furthermore, to better guide color fidelity, we develop a Luminance-Chrominance Adaptive Transformer (LCAT), which decouples luminance and chrominance representations. Extensive experiments demonstrate that DSDNet outperforms state-of-the-art methods in both visual quality and quantitative evaluation, and achieves an inference speed $\mathrm{\textbf{2.4x}}$ faster than the second-best method, highlighting its practical advantages. We provide an anonymous online demo at https://xxxxxxxxdsdnet.github.io/DSDNet/.
△ Less
Submitted 22 April, 2025;
originally announced April 2025.
-
Efficient and Safe Planner for Automated Driving on Ramps Considering Unsatisfication
Authors:
Qinghao Li,
Zhen Tian,
Xiaodan Wang,
Jinming Yang,
Zhihao Lin
Abstract:
Automated driving on ramps presents significant challenges due to the need to balance both safety and efficiency during lane changes. This paper proposes an integrated planner for automated vehicles (AVs) on ramps, utilizing an unsatisfactory level metric for efficiency and arrow-cluster-based sampling for safety. The planner identifies optimal times for the AV to change lanes, taking into account…
▽ More
Automated driving on ramps presents significant challenges due to the need to balance both safety and efficiency during lane changes. This paper proposes an integrated planner for automated vehicles (AVs) on ramps, utilizing an unsatisfactory level metric for efficiency and arrow-cluster-based sampling for safety. The planner identifies optimal times for the AV to change lanes, taking into account the vehicle's velocity as a key factor in efficiency. Additionally, the integrated planner employs arrow-cluster-based sampling to evaluate collision risks and select an optimal lane-changing curve. Extensive simulations were conducted in a ramp scenario to verify the planner's efficient and safe performance. The results demonstrate that the proposed planner can effectively select an appropriate lane-changing time point and a safe lane-changing curve for AVs, without incurring any collisions during the maneuver.
△ Less
Submitted 20 April, 2025;
originally announced April 2025.
-
EditLord: Learning Code Transformation Rules for Code Editing
Authors:
Weichen Li,
Albert Jan,
Baishakhi Ray,
Chengzhi Mao,
Junfeng Yang,
Kexin Pei
Abstract:
Code editing is a foundational task in software development, where its effectiveness depends on whether it introduces desired code property changes without changing the original code's intended functionality. Existing approaches often formulate code editing as an implicit end-to-end task, omitting the fact that code-editing procedures inherently consist of discrete and explicit steps. Thus, they s…
▽ More
Code editing is a foundational task in software development, where its effectiveness depends on whether it introduces desired code property changes without changing the original code's intended functionality. Existing approaches often formulate code editing as an implicit end-to-end task, omitting the fact that code-editing procedures inherently consist of discrete and explicit steps. Thus, they suffer from suboptimal performance and lack of robustness and generalization. We introduce EditLord, a code editing framework that makes the code transformation steps explicit. Our key insight is to employ a language model (LM) as an inductive learner to extract code editing rules from the training code pairs as concise meta-rule sets. Such rule sets will be manifested for each training sample to augment them for finetuning or assist in prompting- and iterative-based code editing. EditLordoutperforms the state-of-the-art by an average of 22.7% in editing performance and 58.1% in robustness while achieving 20.2% higher functional correctness across critical software engineering and security applications, LM models, and editing modes.
△ Less
Submitted 23 April, 2025; v1 submitted 10 March, 2025;
originally announced April 2025.
-
Adaptive sieving with semismooth Newton proximal augmented Lagrangian algorithm for multi-task Lasso problems
Authors:
Lanyu Lin,
Yong-Jin Liu,
Bo Wang,
Junfeng Yang
Abstract:
Multi-task learning enhances model generalization by jointly learning from related tasks. This paper focuses on the $\ell_{1,\infty}$-norm constrained multi-task learning problem, which promotes a shared feature representation while inducing sparsity in task-specific parameters. We propose an adaptive sieving (AS) strategy to efficiently generate a solution path for multi-task Lasso problems. Each…
▽ More
Multi-task learning enhances model generalization by jointly learning from related tasks. This paper focuses on the $\ell_{1,\infty}$-norm constrained multi-task learning problem, which promotes a shared feature representation while inducing sparsity in task-specific parameters. We propose an adaptive sieving (AS) strategy to efficiently generate a solution path for multi-task Lasso problems. Each subproblem along the path is solved via an inexact semismooth Newton proximal augmented Lagrangian ({\sc Ssnpal}) algorithm, achieving an asymptotically superlinear convergence rate. By exploiting the Karush-Kuhn-Tucker (KKT) conditions and the inherent sparsity of multi-task Lasso solutions, the {\sc Ssnpal} algorithm solves a sequence of reduced subproblems with small dimensions. This approach enables our method to scale effectively to large problems. Numerical experiments on synthetic and real-world datasets demonstrate the superior efficiency and robustness of our algorithm compared to state-of-the-art solvers.
△ Less
Submitted 21 April, 2025;
originally announced April 2025.
-
MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core
Authors:
Dennis Liu,
Zijie Yan,
Xin Yao,
Tong Liu,
Vijay Korthikanti,
Evan Wu,
Shiqing Fan,
Gao Deng,
Hongxiao Bai,
Jianbin Chang,
Ashwath Aithal,
Michael Andersch,
Mohammad Shoeybi,
Jiajie Yao,
Chandler Zhou,
David Wu,
Xipeng Li,
June Yang
Abstract:
Mixture of Experts (MoE) models enhance neural network scalability by dynamically selecting relevant experts per input token, enabling larger model sizes while maintaining manageable computation costs. However, efficient training of large-scale MoE models across thousands of GPUs presents significant challenges due to limitations in existing parallelism strategies. We introduce an end-to-end train…
▽ More
Mixture of Experts (MoE) models enhance neural network scalability by dynamically selecting relevant experts per input token, enabling larger model sizes while maintaining manageable computation costs. However, efficient training of large-scale MoE models across thousands of GPUs presents significant challenges due to limitations in existing parallelism strategies. We introduce an end-to-end training framework for large-scale MoE models that utilizes five-dimensional hybrid parallelism: Tensor Parallelism, Expert Parallelism, Context Parallelism, Data Parallelism, and Pipeline Parallelism. Central to our approach is MoE Parallel Folding, a novel strategy that decouples the parallelization of attention and MoE layers in Transformer models, allowing each layer type to adopt optimal parallel configurations. Additionally, we develop a flexible token-level dispatcher that supports both token-dropping and token-dropless MoE training across all five dimensions of parallelism. This dispatcher accommodates dynamic tensor shapes and coordinates different parallelism schemes for Attention and MoE layers, facilitating complex parallelism implementations. Our experiments demonstrate significant improvements in training efficiency and scalability. We achieve up to 49.3% Model Flops Utilization (MFU) for the Mixtral 8x22B model and 39.0% MFU for the Qwen2-57B-A14B model on H100 GPUs, outperforming existing methods. The framework scales efficiently up to 1,024 GPUs and maintains high performance with sequence lengths up to 128K tokens, validating its effectiveness for large-scale MoE model training. The code is available in Megatron-Core.
△ Less
Submitted 23 April, 2025; v1 submitted 21 April, 2025;
originally announced April 2025.
-
Automatic Evaluation Metrics for Document-level Translation: Overview, Challenges and Trends
Authors:
Jiaxin GUO,
Xiaoyu Chen,
Zhiqiang Rao,
Jinlong Yang,
Zongyao Li,
Hengchao Shang,
Daimeng Wei,
Hao Yang
Abstract:
With the rapid development of deep learning technologies, the field of machine translation has witnessed significant progress, especially with the advent of large language models (LLMs) that have greatly propelled the advancement of document-level translation. However, accurately evaluating the quality of document-level translation remains an urgent issue. This paper first introduces the developme…
▽ More
With the rapid development of deep learning technologies, the field of machine translation has witnessed significant progress, especially with the advent of large language models (LLMs) that have greatly propelled the advancement of document-level translation. However, accurately evaluating the quality of document-level translation remains an urgent issue. This paper first introduces the development status of document-level translation and the importance of evaluation, highlighting the crucial role of automatic evaluation metrics in reflecting translation quality and guiding the improvement of translation systems. It then provides a detailed analysis of the current state of automatic evaluation schemes and metrics, including evaluation methods with and without reference texts, as well as traditional metrics, Model-based metrics and LLM-based metrics. Subsequently, the paper explores the challenges faced by current evaluation methods, such as the lack of reference diversity, dependence on sentence-level alignment information, and the bias, inaccuracy, and lack of interpretability of the LLM-as-a-judge method. Finally, the paper looks ahead to the future trends in evaluation methods, including the development of more user-friendly document-level evaluation methods and more robust LLM-as-a-judge methods, and proposes possible research directions, such as reducing the dependency on sentence-level information, introducing multi-level and multi-granular evaluation approaches, and training models specifically for machine translation evaluation. This study aims to provide a comprehensive analysis of automatic evaluation for document-level translation and offer insights into future developments.
△ Less
Submitted 20 April, 2025;
originally announced April 2025.
-
Enhanced Data-driven Topology Design Methodology with Multi-level Mesh and Correlation-based Mutation for Stress-related Multi-objective Optimization
Authors:
Jun Yang,
Shintaro Yamasaki
Abstract:
Topology optimization (TO) serves as a widely applied structural design approach to tackle various engineering problems. Nevertheless, sensitivity-based TO methods usually struggle with solving strongly nonlinear optimization problems. By leveraging high capacity of deep generative model, which is an influential machine learning technique, the sensitivity-free data-driven topology design (DDTD) me…
▽ More
Topology optimization (TO) serves as a widely applied structural design approach to tackle various engineering problems. Nevertheless, sensitivity-based TO methods usually struggle with solving strongly nonlinear optimization problems. By leveraging high capacity of deep generative model, which is an influential machine learning technique, the sensitivity-free data-driven topology design (DDTD) methodology is regarded as an effective means of overcoming these issues. The DDTD methodology depends on initial dataset with a certain regularity, making its results highly sensitive to initial dataset quality. This limits its effectiveness and generalizability, especially for optimization problems without priori information. In this research, we proposed a multi-level mesh DDTD-based method with correlation-based mutation module to escape from the limitation of the quality of the initial dataset on the results and enhance computational efficiency. The core is to employ a correlation-based mutation module to assign new geometric features with physical meaning to the generated data, while utilizing a multi-level mesh strategy to progressively enhance the refinement of the structural representation, thus avoiding the maintenance of a high degree-of-freedom (DOF) representation throughout the iterative process. The proposed multi-level mesh DDTD-based method can be driven by a low quality initial dataset without the need for time-consuming construction of a specific dataset, thus significantly increasing generality and reducing application difficulty, while further lowering computational cost of DDTD methodology. Various comparison experiments with the traditional sensitivity-based TO methods on stress-related strongly nonlinear problems demonstrate the generality and effectiveness of the proposed method.
△ Less
Submitted 20 April, 2025;
originally announced April 2025.
-
Data-Driven Evolutionary Game-Based Model Predictive Control for Hybrid Renewable Energy Dispatch in Autonomous Ships
Authors:
Yaoze Liu,
Zhen Tian,
Jinming Yang,
Zhihao Lin
Abstract:
In this paper, we propose a data-driven Evolutionary Game-Based Model Predictive Control (EG-MPC) framework for the energy dispatch of a hybrid renewable energy system powering an autonomous ship. The system integrates solar photovoltaic and wind turbine generation with battery energy storage and diesel backup power to ensure reliable operation. Given the uncertainties in renewable generation and…
▽ More
In this paper, we propose a data-driven Evolutionary Game-Based Model Predictive Control (EG-MPC) framework for the energy dispatch of a hybrid renewable energy system powering an autonomous ship. The system integrates solar photovoltaic and wind turbine generation with battery energy storage and diesel backup power to ensure reliable operation. Given the uncertainties in renewable generation and dynamic energy demands, an optimal dispatch strategy is crucial to minimize operational costs while maintaining system reliability. To address these challenges, we formulate a cost minimization problem that considers both battery degradation costs and diesel fuel expenses, leveraging real-world data to enhance modeling accuracy. The EG-MPC approach integrates evolutionary game dynamics within a receding-horizon optimization framework, enabling adaptive and near-optimal control solutions in real time. Simulation results based on site-specific data demonstrate that the proposed method achieves cost-effective, reliable, and adaptive energy dispatch, outperforming conventional rule-based and standard MPC approaches, particularly under uncertainty.
△ Less
Submitted 20 April, 2025;
originally announced April 2025.
-
Adaptive Field Effect Planner for Safe Interactive Autonomous Driving on Curved Roads
Authors:
Qinghao Li,
Zhen Tian,
Xiaodan Wang,
Jinming Yang,
Zhihao Lin
Abstract:
Autonomous driving has garnered significant attention for its potential to improve safety, traffic efficiency, and user convenience. However, the dynamic and complex nature of interactive driving poses significant challenges, including the need to navigate non-linear road geometries, handle dynamic obstacles, and meet stringent safety and comfort requirements. Traditional approaches, such as artif…
▽ More
Autonomous driving has garnered significant attention for its potential to improve safety, traffic efficiency, and user convenience. However, the dynamic and complex nature of interactive driving poses significant challenges, including the need to navigate non-linear road geometries, handle dynamic obstacles, and meet stringent safety and comfort requirements. Traditional approaches, such as artificial potential fields (APF), often fall short in addressing these complexities independently, necessitating the development of integrated and adaptive frameworks. This paper presents a novel approach to autonomous vehicle navigation that integrates artificial potential fields, Frenet coordinates, and improved particle swarm optimization (IPSO). A dynamic risk field, adapted from traditional APF, is proposed to ensure interactive safety by quantifying risks and dynamically adjusting lane-changing intentions based on surrounding vehicle behavior. Frenet coordinates are utilized to simplify trajectory planning on non-straight roads, while an enhanced quintic polynomial trajectory generator ensures smooth and comfortable path transitions. Additionally, an IPSO algorithm optimizes trajectory selection in real time, balancing safety and user comfort within a feasible input range. The proposed framework is validated through extensive simulations and real-world scenarios, demonstrating its ability to navigate complex traffic environments, maintain safety margins, and generate smooth, dynamically feasible trajectories.
△ Less
Submitted 20 April, 2025;
originally announced April 2025.
-
An effective finite-range Gogny-type interaction for the quantum molecular dynamics like model
Authors:
Meiqi Sun,
Dandan Niu,
Junping Yang,
Ying Cui,
Zhuxia Li,
Qiang Zhao,
Kai Zhao,
Yingxun Zhang
Abstract:
In this work, we propose an effective finite-range Gogny-type interaction that can be directly used in the quantum molecular dynamics (QMD) like model. Two methods for determining the parameters of the effective interaction are discussed. The first method establishes an approach to connect the conventional Gogny interaction in nuclear structure to that in heavy-ion collisions, the second method al…
▽ More
In this work, we propose an effective finite-range Gogny-type interaction that can be directly used in the quantum molecular dynamics (QMD) like model. Two methods for determining the parameters of the effective interaction are discussed. The first method establishes an approach to connect the conventional Gogny interaction in nuclear structure to that in heavy-ion collisions, the second method allows for the description of the symmetry energy varying from the supersoft to stiff, as well as the momentum-dependent symmetry potential, exhibiting behaviors ranging from monotonic to non-monotonic variations. This effective interaction opens up opportunities for a deeper understanding of finite-range interactions and non-monotonic momentum-dependent symmetry potentials in future studies.
△ Less
Submitted 20 April, 2025;
originally announced April 2025.
-
Sensitivity of the CUPID experiment to $0νββ$ decay of $^{100}$Mo
Authors:
K. Alfonso,
A. Armatol,
C. Augier,
F. T. Avignone III,
O. Azzolini,
A. S. Barabash,
G. Bari,
A. Barresi,
D. Baudin,
F. Bellini,
G. Benato,
L. Benussi,
V. Berest,
M. Beretta,
L. Bergé,
M. Bettelli,
M. Biassoni,
J. Billard,
F. Boffelli,
V. Boldrini,
E. D. Brandani,
C. Brofferio,
C. Bucci,
M. Buchynska,
J. Camilleri
, et al. (167 additional authors not shown)
Abstract:
CUPID is a next-generation bolometric experiment to search for neutrinoless double-beta decay ($0νββ$) of $^{100}$Mo using Li$_2$MoO$_4$ scintillating crystals. It will operate 1596 crystals at $\sim$10 mK in the CUORE cryostat at the Laboratori Nazionali del Gran Sasso in Italy. Each crystal will be facing two Ge-based bolometric light detectors for $α$ rejection. We compute the discovery and the…
▽ More
CUPID is a next-generation bolometric experiment to search for neutrinoless double-beta decay ($0νββ$) of $^{100}$Mo using Li$_2$MoO$_4$ scintillating crystals. It will operate 1596 crystals at $\sim$10 mK in the CUORE cryostat at the Laboratori Nazionali del Gran Sasso in Italy. Each crystal will be facing two Ge-based bolometric light detectors for $α$ rejection. We compute the discovery and the exclusion sensitivity of CUPID to $0νββ$ in a Frequentist and a Bayesian framework. This computation is done numerically based on pseudo-experiments. For the CUPID baseline scenario, with a background and an energy resolution of $1.0 \times 10^{-4}$ counts/keV/kg/yr and 5 keV FWHM at the Q-value, respectively, this results in a Bayesian exclusion sensitivity (90% c.i.) of $\hat{T}_{1/2} > 1.6^{+0.6}_{-0.5} \times 10^{27} \ \mathrm{yr}$, corresponding to the effective Majorana neutrino mass of $\hat{m}_{ββ} < \ 9.6$ -- $16.3 \ \mathrm{meV}$. The Frequentist discovery sensitivity (3$σ$) is $\hat{T}_{1/2}= 1.0 \times 10^{27} \ \mathrm{yr}$, corresponding to $\hat{m}_{ββ}= \ 12.2$ -- $20.6 \ \mathrm{meV}$.
△ Less
Submitted 19 April, 2025;
originally announced April 2025.
-
Logarithmic Crystalline Representations
Authors:
Zhenmou Liu,
Jinbang Yang,
Kang Zuo
Abstract:
In 1989, Faltings proved the comparison theorem between étale cohomology and crystalline cohomology by studying Fontaine-Faltings modules and crystalline representations. In his paper, he mentioned these modules and representations can be extended to the logarithmic context, but without detail. This note aims to explicitly present the construction of logarithmic Fontaine-Faltings modules and logar…
▽ More
In 1989, Faltings proved the comparison theorem between étale cohomology and crystalline cohomology by studying Fontaine-Faltings modules and crystalline representations. In his paper, he mentioned these modules and representations can be extended to the logarithmic context, but without detail. This note aims to explicitly present the construction of logarithmic Fontaine-Faltings modules and logarithmic crystalline representations.
△ Less
Submitted 19 April, 2025;
originally announced April 2025.
-
Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models
Authors:
Junjie Yang,
Junhao Song,
Xudong Han,
Ziqian Bi,
Tianyang Wang,
Chia Xin Liang,
Xinyuan Song,
Yichao Zhang,
Qian Niu,
Benji Peng,
Keyu Chen,
Ming Liu
Abstract:
Knowledge distillation (KD) is a technique for transferring knowledge from complex teacher models to simpler student models, significantly enhancing model efficiency and accuracy. It has demonstrated substantial advancements in various applications including image classification, object detection, language modeling, text classification, and sentiment analysis. Recent innovations in KD methods, suc…
▽ More
Knowledge distillation (KD) is a technique for transferring knowledge from complex teacher models to simpler student models, significantly enhancing model efficiency and accuracy. It has demonstrated substantial advancements in various applications including image classification, object detection, language modeling, text classification, and sentiment analysis. Recent innovations in KD methods, such as attention-based approaches, block-wise logit distillation, and decoupling distillation, have notably improved student model performance. These techniques focus on stimulus complexity, attention mechanisms, and global information capture to optimize knowledge transfer. In addition, KD has proven effective in compressing large language models while preserving accuracy, reducing computational overhead, and improving inference speed. This survey synthesizes the latest literature, highlighting key findings, contributions, and future directions in knowledge distillation to provide insights for researchers and practitioners on its evolving role in artificial intelligence and machine learning.
△ Less
Submitted 18 April, 2025;
originally announced April 2025.
-
RefComp: A Reference-guided Unified Framework for Unpaired Point Cloud Completion
Authors:
Yixuan Yang,
Jinyu Yang,
Zixiang Zhao,
Victor Sanchez,
Feng Zheng
Abstract:
The unpaired point cloud completion task aims to complete a partial point cloud by using models trained with no ground truth. Existing unpaired point cloud completion methods are class-aware, i.e., a separate model is needed for each object class. Since they have limited generalization capabilities, these methods perform poorly in real-world scenarios when confronted with a wide range of point clo…
▽ More
The unpaired point cloud completion task aims to complete a partial point cloud by using models trained with no ground truth. Existing unpaired point cloud completion methods are class-aware, i.e., a separate model is needed for each object class. Since they have limited generalization capabilities, these methods perform poorly in real-world scenarios when confronted with a wide range of point clouds of generic 3D objects. In this paper, we propose a novel unpaired point cloud completion framework, namely the Reference-guided Completion (RefComp) framework, which attains strong performance in both the class-aware and class-agnostic training settings. The RefComp framework transforms the unpaired completion problem into a shape translation problem, which is solved in the latent feature space of the partial point clouds. To this end, we introduce the use of partial-complete point cloud pairs, which are retrieved by using the partial point cloud to be completed as a template. These point cloud pairs are used as reference data to guide the completion process. Our RefComp framework uses a reference branch and a target branch with shared parameters for shape fusion and shape translation via a Latent Shape Fusion Module (LSFM) to enhance the structural features along the completion pipeline. Extensive experiments demonstrate that the RefComp framework achieves not only state-of-the-art performance in the class-aware training setting but also competitive results in the class-agnostic training setting on both virtual scans and real-world datasets.
△ Less
Submitted 18 April, 2025;
originally announced April 2025.
-
Search for $J/ψ\rightarrow K^{0}_{S}K^{0}_{S}$ and $ψ(3686)\rightarrow K^{0}_{S}K^{0}_{S}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (680 additional authors not shown)
Abstract:
Using data samples of $(10087\pm 44)\times10^{6}$ $J/ψ$ events and $(2712.4\pm 14.3)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector at the BEPCII collider, we search for the CP violating decays $J/ψ\rightarrow K^{0}_{S}K^{0}_{S}$ and $ψ(3686)\rightarrow K^{0}_{S}K^{0}_{S}$. No significant signals are observed over the expected background yields. The upper limits on their branchin…
▽ More
Using data samples of $(10087\pm 44)\times10^{6}$ $J/ψ$ events and $(2712.4\pm 14.3)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector at the BEPCII collider, we search for the CP violating decays $J/ψ\rightarrow K^{0}_{S}K^{0}_{S}$ and $ψ(3686)\rightarrow K^{0}_{S}K^{0}_{S}$. No significant signals are observed over the expected background yields. The upper limits on their branching fractions are set as $\mathcal{B}(J/ψ\rightarrow K^{0}_{S}K^{0}_{S}) <4.7\times 10^{-9}$ and $\mathcal{B}(ψ(3686)\rightarrow K^{0}_{S}K^{0}_{S}) <1.1\times 10^{-8}$ at the 90% confidence level. These results improve the previous limits by a factor of three for $J/ψ\rightarrow K^{0}_{S} K^{0}_{S}$ and two orders of magnitude for $ψ(3686)\rightarrow K^{0}_{S} K^{0}_{S}$.
△ Less
Submitted 18 April, 2025;
originally announced April 2025.
-
How to Achieve Higher Accuracy with Less Training Points?
Authors:
Jinghan Yang,
Anupam Pani,
Yunchao Zhang
Abstract:
In the era of large-scale model training, the extensive use of available datasets has resulted in significant computational inefficiencies. To tackle this issue, we explore methods for identifying informative subsets of training data that can achieve comparable or even superior model performance. We propose a technique based on influence functions to determine which training samples should be incl…
▽ More
In the era of large-scale model training, the extensive use of available datasets has resulted in significant computational inefficiencies. To tackle this issue, we explore methods for identifying informative subsets of training data that can achieve comparable or even superior model performance. We propose a technique based on influence functions to determine which training samples should be included in the training set. We conducted empirical evaluations of our method on binary classification tasks utilizing logistic regression models. Our approach demonstrates performance comparable to that of training on the entire dataset while using only 10% of the data. Furthermore, we found that our method achieved even higher accuracy when trained with just 60% of the data.
△ Less
Submitted 18 April, 2025;
originally announced April 2025.
-
WeatherGen: A Unified Diverse Weather Generator for LiDAR Point Clouds via Spider Mamba Diffusion
Authors:
Yang Wu,
Yun Zhu,
Kaihua Zhang,
Jianjun Qian,
Jin Xie,
Jian Yang
Abstract:
3D scene perception demands a large amount of adverse-weather LiDAR data, yet the cost of LiDAR data collection presents a significant scaling-up challenge. To this end, a series of LiDAR simulators have been proposed. Yet, they can only simulate a single adverse weather with a single physical model, and the fidelity of the generated data is quite limited. This paper presents WeatherGen, the first…
▽ More
3D scene perception demands a large amount of adverse-weather LiDAR data, yet the cost of LiDAR data collection presents a significant scaling-up challenge. To this end, a series of LiDAR simulators have been proposed. Yet, they can only simulate a single adverse weather with a single physical model, and the fidelity of the generated data is quite limited. This paper presents WeatherGen, the first unified diverse-weather LiDAR data diffusion generation framework, significantly improving fidelity. Specifically, we first design a map-based data producer, which can provide a vast amount of high-quality diverse-weather data for training purposes. Then, we utilize the diffusion-denoising paradigm to construct a diffusion model. Among them, we propose a spider mamba generator to restore the disturbed diverse weather data gradually. The spider mamba models the feature interactions by scanning the LiDAR beam circle or central ray, excellently maintaining the physical structure of the LiDAR data. Subsequently, following the generator to transfer real-world knowledge, we design a latent feature aligner. Afterward, we devise a contrastive learning-based controller, which equips weather control signals with compact semantic knowledge through language supervision, guiding the diffusion model to generate more discriminative data. Extensive evaluations demonstrate the high generation quality of WeatherGen. Through WeatherGen, we construct the mini-weather dataset, promoting the performance of the downstream task under adverse weather conditions. Code is available: https://github.com/wuyang98/weathergen
△ Less
Submitted 18 April, 2025;
originally announced April 2025.
-
Search for $1^{-+}$ charmonium-like hybrid via $e^{+}e^{-}\rightarrow γη^{(\prime)} η_{c}$ at center-of-mass energies between 4.258 and 4.681 GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (696 additional authors not shown)
Abstract:
Using $e^{+}e^{-}$ collision data corresponding to an integrated luminosity of 10.6 fb$^{-1}$ collected at center-of-mass energies between 4.258 and 4.681 GeV with the BESIII detector at the BEPCII collider, we search for the $1^{- +}$ charmonium-like hybrid via $e^{+}e^{-}\rightarrowγηη_{c}$ and $e^{+}e^{-}\rightarrowγη^{\prime}η_{c}$ decays for the first time. No significant signal is observed a…
▽ More
Using $e^{+}e^{-}$ collision data corresponding to an integrated luminosity of 10.6 fb$^{-1}$ collected at center-of-mass energies between 4.258 and 4.681 GeV with the BESIII detector at the BEPCII collider, we search for the $1^{- +}$ charmonium-like hybrid via $e^{+}e^{-}\rightarrowγηη_{c}$ and $e^{+}e^{-}\rightarrowγη^{\prime}η_{c}$ decays for the first time. No significant signal is observed and the upper limits on the Born cross sections for both processes are set at the 90% confidence level.
△ Less
Submitted 18 April, 2025;
originally announced April 2025.
-
Improving Sequential Recommenders through Counterfactual Augmentation of System Exposure
Authors:
Ziqi Zhao,
Zhaochun Ren,
Jiyuan Yang,
Zuming Yan,
Zihan Wang,
Liu Yang,
Pengjie Ren,
Zhumin Chen,
Maarten de Rijke,
Xin Xin
Abstract:
In sequential recommendation (SR), system exposure refers to items that are exposed to the user. Typically, only a few of the exposed items would be interacted with by the user. Although SR has achieved great success in predicting future user interests, existing SR methods still fail to fully exploit system exposure data. Most methods only model items that have been interacted with, while the larg…
▽ More
In sequential recommendation (SR), system exposure refers to items that are exposed to the user. Typically, only a few of the exposed items would be interacted with by the user. Although SR has achieved great success in predicting future user interests, existing SR methods still fail to fully exploit system exposure data. Most methods only model items that have been interacted with, while the large volume of exposed but non-interacted items is overlooked. Even methods that consider the whole system exposure typically train the recommender using only the logged historical system exposure, without exploring unseen user interests.
In this paper, we propose counterfactual augmentation over system exposure for sequential recommendation (CaseRec). To better model historical system exposure, CaseRec introduces reinforcement learning to account for different exposure rewards. CaseRec uses a decision transformer-based sequential model to take an exposure sequence as input and assigns different rewards according to the user feedback. To further explore unseen user interests, CaseRec proposes to perform counterfactual augmentation, where exposed original items are replaced with counterfactual items. Then, a transformer-based user simulator is proposed to predict the user feedback reward for the augmented items. Augmentation, together with the user simulator, constructs counterfactual exposure sequences to uncover new user interests. Finally, CaseRec jointly uses the logged exposure sequences with the counterfactual exposure sequences to train a decision transformer-based sequential model for generating recommendation. Experiments on three real-world benchmarks show the effectiveness of CaseRec. Our code is available at https://github.com/ZiqiZhao1/CaseRec.
△ Less
Submitted 18 April, 2025;
originally announced April 2025.
-
ProgRoCC: A Progressive Approach to Rough Crowd Counting
Authors:
Shengqin Jiang,
Linfei Li,
Haokui Zhang,
Qingshan Liu,
Amin Beheshti,
Jian Yang,
Anton van den Hengel,
Quan Z. Sheng,
Yuankai Qi
Abstract:
As the number of individuals in a crowd grows, enumeration-based techniques become increasingly infeasible and their estimates increasingly unreliable. We propose instead an estimation-based version of the problem: we label Rough Crowd Counting that delivers better accuracy on the basis of training data that is easier to acquire. Rough crowd counting requires only rough annotations of the number o…
▽ More
As the number of individuals in a crowd grows, enumeration-based techniques become increasingly infeasible and their estimates increasingly unreliable. We propose instead an estimation-based version of the problem: we label Rough Crowd Counting that delivers better accuracy on the basis of training data that is easier to acquire. Rough crowd counting requires only rough annotations of the number of targets in an image, instead of the more traditional, and far more expensive, per-target annotations. We propose an approach to the rough crowd counting problem based on CLIP, termed ProgRoCC. Specifically, we introduce a progressive estimation learning strategy that determines the object count through a coarse-to-fine approach. This approach delivers answers quickly, outperforms the state-of-the-art in semi- and weakly-supervised crowd counting. In addition, we design a vision-language matching adapter that optimizes key-value pairs by mining effective matches of two modalities to refine the visual features, thereby improving the final performance. Extensive experimental results on three widely adopted crowd counting datasets demonstrate the effectiveness of our method.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
Bibliometric Analysis of Scientific Publications on Blockchain Research and Applications
Authors:
Lingfeng Bao,
Jiameng Yang,
Xiaohu Yang,
Chunming Rong
Abstract:
Since the introduction of Bitcoin in 2008, blockchain technology has garnered widespread attention. Scholars from various research fields, countries, and institutions have published a significant number of papers on this subject. However, there is currently a lack of comprehensive analysis specifically focusing on the scientific publications in the field of blockchain.
To conduct a comprehensive…
▽ More
Since the introduction of Bitcoin in 2008, blockchain technology has garnered widespread attention. Scholars from various research fields, countries, and institutions have published a significant number of papers on this subject. However, there is currently a lack of comprehensive analysis specifically focusing on the scientific publications in the field of blockchain.
To conduct a comprehensive analysis, we compiled a corpus of 41,497 publications in blockchain research from 2008 to 2023 using the Clarivate databases. Through bibliometric and citation analyses, we gained valuable insights into the field. Our study offers an overview of the blockchain research landscape, including country, institution, authorship, and subject categories. Additionally, we identified Emerging Research Areas (ERA) using the co-citation clustering approach, examining factors such as recency, growth, and contributions from different countries/regions. Furthermore, we identified influential publications based on citation velocity and analyzed five representative Research Fronts in detail. This analysis provides a fine-grained examination of specific areas within blockchain research. Our findings contribute to understanding evolving trends, emerging applications, and potential directions for future research in the multidisciplinary field of blockchain.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
Concept Enhancement Engineering: A Lightweight and Efficient Robust Defense Against Jailbreak Attacks in Embodied AI
Authors:
Jirui Yang,
Zheyu Lin,
Shuhan Yang,
Zhihui Lu,
Xin Du
Abstract:
Embodied Intelligence (EI) systems integrated with large language models (LLMs) face significant security risks, particularly from jailbreak attacks that manipulate models into generating harmful outputs or executing unsafe physical actions. Traditional defense strategies, such as input filtering and output monitoring, often introduce high computational overhead or interfere with task performance…
▽ More
Embodied Intelligence (EI) systems integrated with large language models (LLMs) face significant security risks, particularly from jailbreak attacks that manipulate models into generating harmful outputs or executing unsafe physical actions. Traditional defense strategies, such as input filtering and output monitoring, often introduce high computational overhead or interfere with task performance in real-time embodied scenarios. To address these challenges, we propose Concept Enhancement Engineering (CEE), a novel defense framework that leverages representation engineering to enhance the safety of embodied LLMs by dynamically steering their internal activations. CEE operates by (1) extracting multilingual safety patterns from model activations, (2) constructing control directions based on safety-aligned concept subspaces, and (3) applying subspace concept rotation to reinforce safe behavior during inference. Our experiments demonstrate that CEE effectively mitigates jailbreak attacks while maintaining task performance, outperforming existing defense methods in both robustness and efficiency. This work contributes a scalable and interpretable safety mechanism for embodied AI, bridging the gap between theoretical representation engineering and practical security applications. Our findings highlight the potential of latent-space interventions as a viable defense paradigm against emerging adversarial threats in physically grounded AI systems.
△ Less
Submitted 14 April, 2025;
originally announced April 2025.
-
SkyReels-V2: Infinite-length Film Generative Model
Authors:
Guibin Chen,
Dixuan Lin,
Jiangping Yang,
Chunze Lin,
Junchen Zhu,
Mingyuan Fan,
Hao Zhang,
Sheng Chen,
Zheng Chen,
Chengcheng Ma,
Weiming Xiong,
Wei Wang,
Nuo Pang,
Kang Kang,
Zhiheng Xu,
Yuzhe Jin,
Yupeng Liang,
Yubing Song,
Peng Zhao,
Boyuan Xu,
Di Qiu,
Debang Li,
Zhengcong Fei,
Yang Li,
Yahui Zhou
Abstract:
Recent advances in video generation have been driven by diffusion models and autoregressive frameworks, yet critical challenges persist in harmonizing prompt adherence, visual quality, motion dynamics, and duration: compromises in motion dynamics to enhance temporal visual quality, constrained video duration (5-10 seconds) to prioritize resolution, and inadequate shot-aware generation stemming fro…
▽ More
Recent advances in video generation have been driven by diffusion models and autoregressive frameworks, yet critical challenges persist in harmonizing prompt adherence, visual quality, motion dynamics, and duration: compromises in motion dynamics to enhance temporal visual quality, constrained video duration (5-10 seconds) to prioritize resolution, and inadequate shot-aware generation stemming from general-purpose MLLMs' inability to interpret cinematic grammar, such as shot composition, actor expressions, and camera motions. These intertwined limitations hinder realistic long-form synthesis and professional film-style generation. To address these limitations, we propose SkyReels-V2, an Infinite-length Film Generative Model, that synergizes Multi-modal Large Language Model (MLLM), Multi-stage Pretraining, Reinforcement Learning, and Diffusion Forcing Framework. Firstly, we design a comprehensive structural representation of video that combines the general descriptions by the Multi-modal LLM and the detailed shot language by sub-expert models. Aided with human annotation, we then train a unified Video Captioner, named SkyCaptioner-V1, to efficiently label the video data. Secondly, we establish progressive-resolution pretraining for the fundamental video generation, followed by a four-stage post-training enhancement: Initial concept-balanced Supervised Fine-Tuning (SFT) improves baseline quality; Motion-specific Reinforcement Learning (RL) training with human-annotated and synthetic distortion data addresses dynamic artifacts; Our diffusion forcing framework with non-decreasing noise schedules enables long-video synthesis in an efficient search space; Final high-quality SFT refines visual fidelity. All the code and models are available at https://github.com/SkyworkAI/SkyReels-V2.
△ Less
Submitted 21 April, 2025; v1 submitted 17 April, 2025;
originally announced April 2025.
-
Hierarchical Feature Learning for Medical Point Clouds via State Space Model
Authors:
Guoqing Zhang,
Jingyun Yang,
Yang Li
Abstract:
Deep learning-based point cloud modeling has been widely investigated as an indispensable component of general shape analysis. Recently, transformer and state space model (SSM) have shown promising capacities in point cloud learning. However, limited research has been conducted on medical point clouds, which have great potential in disease diagnosis and treatment. This paper presents an SSM-based…
▽ More
Deep learning-based point cloud modeling has been widely investigated as an indispensable component of general shape analysis. Recently, transformer and state space model (SSM) have shown promising capacities in point cloud learning. However, limited research has been conducted on medical point clouds, which have great potential in disease diagnosis and treatment. This paper presents an SSM-based hierarchical feature learning framework for medical point cloud understanding. Specifically, we down-sample input into multiple levels through the farthest point sampling. At each level, we perform a series of k-nearest neighbor (KNN) queries to aggregate multi-scale structural information. To assist SSM in processing point clouds, we introduce coordinate-order and inside-out scanning strategies for efficient serialization of irregular points. Point features are calculated progressively from short neighbor sequences and long point sequences through vanilla and group Point SSM blocks, to capture both local patterns and long-range dependencies. To evaluate the proposed method, we build a large-scale medical point cloud dataset named MedPointS for anatomy classification, completion, and segmentation. Extensive experiments conducted on MedPointS demonstrate that our method achieves superior performance across all tasks. The dataset is available at https://flemme-docs.readthedocs.io/en/latest/medpoints.html. Code is merged to a public medical imaging platform: https://github.com/wlsdzyzl/flemme.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
Simultaneous Polysomnography and Cardiotocography Reveal Temporal Correlation Between Maternal Obstructive Sleep Apnea and Fetal Hypoxia
Authors:
Jingyu Wang,
Donglin Xie,
Jingying Ma,
Yunliang Sun,
Linyan Zhang,
Rui Bai,
Zelin Tu,
Liyue Xu,
Jun Wei,
Jingjing Yang,
Yanan Liu,
Huijie Yi,
Bing Zhou,
Long Zhao,
Xueli Zhang,
Mengling Feng,
Xiaosong Dong,
Guoli Liu,
Fang Han,
Shenda Hong
Abstract:
Background: Obstructive sleep apnea syndrome (OSAS) during pregnancy is common and can negatively affect fetal outcomes. However, studies on the immediate effects of maternal hypoxia on fetal heart rate (FHR) changes are lacking. Methods: We used time-synchronized polysomnography (PSG) and cardiotocography (CTG) data from two cohorts to analyze the correlation between maternal hypoxia and FHR chan…
▽ More
Background: Obstructive sleep apnea syndrome (OSAS) during pregnancy is common and can negatively affect fetal outcomes. However, studies on the immediate effects of maternal hypoxia on fetal heart rate (FHR) changes are lacking. Methods: We used time-synchronized polysomnography (PSG) and cardiotocography (CTG) data from two cohorts to analyze the correlation between maternal hypoxia and FHR changes (accelerations or decelerations). Maternal hypoxic event characteristics were analyzed using generalized linear modeling (GLM) to assess their associations with different FHR changes. Results: A total of 118 pregnant women participated. FHR changes were significantly associated with maternal hypoxia, primarily characterized by accelerations. A longer hypoxic duration correlated with more significant FHR accelerations (P < 0.05), while prolonged hypoxia and greater SpO2 drop were linked to FHR decelerations (P < 0.05). Both cohorts showed a transient increase in FHR during maternal hypoxia, which returned to baseline after the event resolved. Conclusion: Maternal hypoxia significantly affects FHR, suggesting that maternal OSAS may contribute to fetal hypoxia. These findings highlight the importance of maternal-fetal interactions and provide insights for future interventions.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
SC3EF: A Joint Self-Correlation and Cross-Correspondence Estimation Framework for Visible and Thermal Image Registration
Authors:
Xi Tong,
Xing Luo,
Jiangxin Yang,
Yanpeng Cao
Abstract:
Multispectral imaging plays a critical role in a range of intelligent transportation applications, including advanced driver assistance systems (ADAS), traffic monitoring, and night vision. However, accurate visible and thermal (RGB-T) image registration poses a significant challenge due to the considerable modality differences. In this paper, we present a novel joint Self-Correlation and Cross-Co…
▽ More
Multispectral imaging plays a critical role in a range of intelligent transportation applications, including advanced driver assistance systems (ADAS), traffic monitoring, and night vision. However, accurate visible and thermal (RGB-T) image registration poses a significant challenge due to the considerable modality differences. In this paper, we present a novel joint Self-Correlation and Cross-Correspondence Estimation Framework (SC3EF), leveraging both local representative features and global contextual cues to effectively generate RGB-T correspondences. For this purpose, we design a convolution-transformer-based pipeline to extract local representative features and encode global correlations of intra-modality for inter-modality correspondence estimation between unaligned visible and thermal images. After merging the local and global correspondence estimation results, we further employ a hierarchical optical flow estimation decoder to progressively refine the estimated dense correspondence maps. Extensive experiments demonstrate the effectiveness of our proposed method, outperforming the current state-of-the-art (SOTA) methods on representative RGB-T datasets. Furthermore, it also shows competitive generalization capabilities across challenging scenarios, including large parallax, severe occlusions, adverse weather, and other cross-modal datasets (e.g., RGB-N and RGB-D).
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results
Authors:
Xin Li,
Yeying Jin,
Xin Jin,
Zongwei Wu,
Bingchen Li,
Yufei Wang,
Wenhan Yang,
Yu Li,
Zhibo Chen,
Bihan Wen,
Robby T. Tan,
Radu Timofte,
Qiyu Rong,
Hongyuan Jing,
Mengmeng Zhang,
Jinglong Li,
Xiangyu Lu,
Yi Ren,
Yuting Liu,
Meng Zhang,
Xiang Chen,
Qiyuan Guan,
Jiangxin Dong,
Jinshan Pan,
Conglin Gou
, et al. (112 additional authors not shown)
Abstract:
This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images. This challenge received a wide range of impressive solutions, which are developed and evaluated using our collected real-world Raindrop Clarity dataset. Unlike existing deraining datasets, our Raindrop Clarity dataset is more diverse and challenging in degradation types and contents, which includ…
▽ More
This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images. This challenge received a wide range of impressive solutions, which are developed and evaluated using our collected real-world Raindrop Clarity dataset. Unlike existing deraining datasets, our Raindrop Clarity dataset is more diverse and challenging in degradation types and contents, which includes day raindrop-focused, day background-focused, night raindrop-focused, and night background-focused degradations. This dataset is divided into three subsets for competition: 14,139 images for training, 240 images for validation, and 731 images for testing. The primary objective of this challenge is to establish a new and powerful benchmark for the task of removing raindrops under varying lighting and focus conditions. There are a total of 361 participants in the competition, and 32 teams submitting valid solutions and fact sheets for the final testing phase. These submissions achieved state-of-the-art (SOTA) performance on the Raindrop Clarity dataset. The project can be found at https://lixinustc.github.io/CVPR-NTIRE2025-RainDrop-Competition.github.io/.
△ Less
Submitted 19 April, 2025; v1 submitted 17 April, 2025;
originally announced April 2025.
-
Embodied Neuromorphic Control Applied on a 7-DOF Robotic Manipulator
Authors:
Ziqi Wang,
Jingyue Zhao,
Jichao Yang,
Yaohua Wang,
Xun Xiao,
Yuan Li,
Chao Xiao,
Lei Wang
Abstract:
The development of artificial intelligence towards real-time interaction with the environment is a key aspect of embodied intelligence and robotics. Inverse dynamics is a fundamental robotics problem, which maps from joint space to torque space of robotic systems. Traditional methods for solving it rely on direct physical modeling of robots which is difficult or even impossible due to nonlinearity…
▽ More
The development of artificial intelligence towards real-time interaction with the environment is a key aspect of embodied intelligence and robotics. Inverse dynamics is a fundamental robotics problem, which maps from joint space to torque space of robotic systems. Traditional methods for solving it rely on direct physical modeling of robots which is difficult or even impossible due to nonlinearity and external disturbance. Recently, data-based model-learning algorithms are adopted to address this issue. However, they often require manual parameter tuning and high computational costs. Neuromorphic computing is inherently suitable to process spatiotemporal features in robot motion control at extremely low costs. However, current research is still in its infancy: existing works control only low-degree-of-freedom systems and lack performance quantification and comparison. In this paper, we propose a neuromorphic control framework to control 7 degree-of-freedom robotic manipulators. We use Spiking Neural Network to leverage the spatiotemporal continuity of the motion data to improve control accuracy, and eliminate manual parameters tuning. We validated the algorithm on two robotic platforms, which reduces torque prediction error by at least 60% and performs a target position tracking task successfully. This work advances embodied neuromorphic control by one step forward from proof of concept to applications in complex real-world tasks.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
zkFuzz: Foundation and Framework for Effective Fuzzing of Zero-Knowledge Circuits
Authors:
Hideaki Takahashi,
Jihwan Kim,
Suman Jana,
Junfeng Yang
Abstract:
Zero-knowledge (ZK) circuits enable privacy-preserving computations and are central to many cryptographic protocols. Systems like Circom simplify ZK development by combining witness computation and circuit constraints in one program. However, even small errors can compromise security of ZK programs --under-constrained circuits may accept invalid witnesses, while over-constrained ones may reject va…
▽ More
Zero-knowledge (ZK) circuits enable privacy-preserving computations and are central to many cryptographic protocols. Systems like Circom simplify ZK development by combining witness computation and circuit constraints in one program. However, even small errors can compromise security of ZK programs --under-constrained circuits may accept invalid witnesses, while over-constrained ones may reject valid ones. Static analyzers are often imprecise with high false positives, and formal tools struggle with real-world circuit scale. Additionally, existing tools overlook several critical behaviors, such as intermediate computations and program aborts, and thus miss many vulnerabilities.
Our theoretical contribution is the Trace-Constraint Consistency Test (TCCT), a foundational language-independent formulation of ZK circuit bugs that defines bugs as discrepancies between the execution traces of the computation and the circuit constraints. TCCT captures both intermediate computations and program aborts, detecting bugs that elude prior tools.
Our systems contribution is zkFuzz, a novel program mutation-based fuzzing framework for detecting TCCT violations. zkFuzz systematically mutates the computational logic of Zk programs guided by a novel fitness function, and injects carefully crafted inputs using tailored heuristics to expose bugs. We evaluated zkFuzz on 354 real-world ZK circuits written in Circom, a leading programming system for ZK development. zkFuzz successfully identified 66 bugs, including 38 zero-days --18 of which were confirmed by developers and 6 fixed, earning bug bounties.
△ Less
Submitted 16 April, 2025;
originally announced April 2025.
-
Cross-cultural Deployment of Autonomous Vehicles Using Data-light Inverse Reinforcement Learning
Authors:
Hongliang Lu,
Shuqi Shen,
Junjie Yang,
Chao Lu,
Xinhu Zheng,
Hai Yang
Abstract:
More than the adherence to specific traffic regulations, driving culture touches upon a more implicit part - an informal, conventional, collective behavioral pattern followed by drivers - that varies across countries, regions, and even cities. Such cultural divergence has become one of the biggest challenges in deploying autonomous vehicles (AVs) across diverse regions today. The current emergence…
▽ More
More than the adherence to specific traffic regulations, driving culture touches upon a more implicit part - an informal, conventional, collective behavioral pattern followed by drivers - that varies across countries, regions, and even cities. Such cultural divergence has become one of the biggest challenges in deploying autonomous vehicles (AVs) across diverse regions today. The current emergence of data-driven methods has shown a potential solution to enable culture-compatible driving through learning from data, but what if some underdeveloped regions cannot provide sufficient local data to inform driving culture? This issue is particularly significant for a broader global AV market. Here, we propose a cross-cultural deployment scheme for AVs, called data-light inverse reinforcement learning, designed to re-calibrate culture-specific AVs and assimilate them into other cultures. First, we report the divergence in driving cultures through a comprehensive comparative analysis of naturalistic driving datasets on highways from three countries: Germany, China, and the USA. Then, we demonstrate the effectiveness of our scheme by testing the expeditious cross-cultural deployment across these three countries, with cumulative testing mileage of over 56084 km. The performance is particularly advantageous when cross-cultural deployment is carried out without affluent local data. Results show that we can reduce the dependence on local data by a margin of 98.67% at best. This study is expected to bring a broader, fairer AV global market, particularly in those regions that lack enough local data to develop culture-compatible AVs.
△ Less
Submitted 18 April, 2025; v1 submitted 15 April, 2025;
originally announced April 2025.