-
Swapping objectives accelerates Davis-Yin splitting
Authors:
Edward Duc Hien Nguyen,
Jaewook J. Suh,
Xin Jiang,
Shiqian Ma
Abstract:
In this work, we investigate the application of Davis-Yin splitting (DYS) to convex optimization problems and demonstrate that swapping the roles of the two nonsmooth convex functions can result in a faster convergence rate. Such a swap typically yields a different sequence of iterates, but its impact on convergence behavior has been largely understudied or often overlooked. We address this gap by…
▽ More
In this work, we investigate the application of Davis-Yin splitting (DYS) to convex optimization problems and demonstrate that swapping the roles of the two nonsmooth convex functions can result in a faster convergence rate. Such a swap typically yields a different sequence of iterates, but its impact on convergence behavior has been largely understudied or often overlooked. We address this gap by establishing best-known convergence rates for DYS and its swapped counterpart, using the primal--dual gap function as the performance metric. Our results indicate that variants of the Douglas--Rachford splitting algorithm (a special case of DYS) share the same worst-case rate, whereas the convergence rates of the two DYS variants differ. This discrepancy is further illustrated through concrete examples.
△ Less
Submitted 29 June, 2025;
originally announced June 2025.
-
Tension-Induced Soft Stress and Viscoelastic Bending in Liquid Crystal Elastomers for Enhanced Energy Dissipation
Authors:
Beijun Shen,
Yuefeng Jiang,
Christopher M. Yakacki,
Sung Hoon Kang,
Thao D. Nguyen
Abstract:
Architected materials that harness elastic snap-through buckling can trap energy reversibly. Liquid crystal elastomers (LCEs) exhibit excellent dissipation capabilities due to polymer network viscoelasticity and rate-dependent soft stress behavior associated with mesogen rotation. Incorporating LCEs into buckling lattice structures enhances energy absorption; however, conventional design cannot ta…
▽ More
Architected materials that harness elastic snap-through buckling can trap energy reversibly. Liquid crystal elastomers (LCEs) exhibit excellent dissipation capabilities due to polymer network viscoelasticity and rate-dependent soft stress behavior associated with mesogen rotation. Incorporating LCEs into buckling lattice structures enhances energy absorption; however, conventional design cannot take advantage of the dissipation mechanism associated with mesogen rotation because buckling occurs at strains below the threshold of the soft stress response. In this study, we investigate tension-induced mesogen rotation as an additional dissipation mechanism in horizontal members of structures composed of tilted LCE beams under compression. Viscoelastic properties of LCEs with two crosslinking densities were characterized experimentally, and a nonlinear viscoelastic user-defined element was implemented in Abaqus/Standard to capture finite-strain behavior, including soft stress effects. Simulations and experiments revealed a non-monotonic dependence of energy dissipation on the thickness ratio between horizontal and tilted LCE members. Optimized structures with stretchable horizontal bars dissipated 2-3 times more energy than rigid-bar counterparts by balancing tension-driven soft stress with viscoelastic beam bending. Energy contributions from mesogen rotation and polymer network viscoelasticity were quantified. These findings inform the design strategies for LCE-based architected materials to enhance dissipation.
△ Less
Submitted 29 June, 2025;
originally announced June 2025.
-
Breaking a Logarithmic Barrier in the Stopping Time Convergence Rate of Stochastic First-order Methods
Authors:
Yasong Feng,
Yifan Jiang,
Tianyu Wang,
Zhiliang Ying
Abstract:
This work provides a novel convergence analysis for stochastic optimization in terms of stopping times, addressing the practical reality that algorithms are often terminated adaptively based on observed progress. Unlike prior approaches, our analysis: 1. Directly characterizes convergence in terms of stopping times adapted to the underlying stochastic process. 2. Breaks a logarithmic barrier in ex…
▽ More
This work provides a novel convergence analysis for stochastic optimization in terms of stopping times, addressing the practical reality that algorithms are often terminated adaptively based on observed progress. Unlike prior approaches, our analysis: 1. Directly characterizes convergence in terms of stopping times adapted to the underlying stochastic process. 2. Breaks a logarithmic barrier in existing results. Key to our results is the development of a Grönwall-type argument tailored to such stochastic processes. This tool enables sharper bounds without restrictive assumptions.
△ Less
Submitted 29 June, 2025;
originally announced June 2025.
-
Band-Gap Tunability in Anharmonic Perovskite-like Semiconductors Driven by Polar Electron-Phonon Coupling
Authors:
Pol Benítez,
Ruoshi Jiang,
Siyu Chen,
Cibrán López,
Josep-Lluís Tamarit,
Edgardo Saucedo,
Bartomeu Monserrat,
Claudio Cazorla
Abstract:
The ability to finely tune optoelectronic properties in semiconductors is crucial for the development of advanced technologies, ranging from photodetectors to photovoltaics. In this work, we propose a novel strategy to achieve such tunability by utilizing electric fields to excite low-energy polar optical phonon modes, which strongly couple to electronic states in anharmonic semiconductors. We con…
▽ More
The ability to finely tune optoelectronic properties in semiconductors is crucial for the development of advanced technologies, ranging from photodetectors to photovoltaics. In this work, we propose a novel strategy to achieve such tunability by utilizing electric fields to excite low-energy polar optical phonon modes, which strongly couple to electronic states in anharmonic semiconductors. We conducted a high-throughput screening of over $10,000$ materials, focusing on centrosymmetric compounds with imaginary polar phonon modes and suitable band gaps, and identified $310$ promising candidates with potential for enhanced optoelectronic tunability. From this set, three perovskite-like compounds --Ag$_3$SBr, BaTiO$_3$, and PbHfO$_3$-- were selected for in-depth investigation based on their contrasting band-gap behavior with temperature. Using first-principles calculations, \textit{ab initio} molecular dynamics simulations, tight-binding models, and anharmonic Fröhlich theory, we analyzed the underlying physical mechanisms. Our results show that polar phonon distortions can induce substantial band-gap modulations at ambient conditions, including reductions of up to $70\%$ in Ag$_3$SBr and increases of nearly $23\%$ in BaTiO$_3$, relative to values calculated at zero temperature, while PbHfO$_3$ exhibits minimal change. These contrasting responses arise from distinct electron-phonon coupling mechanisms and orbital hybridization at the band edges. This work establishes key design principles for harnessing polar lattice dynamics to engineer tunable optoelectronic properties, paving the way for adaptive technologies such as wavelength-selective optical devices and solar absorbers.
△ Less
Submitted 29 June, 2025;
originally announced June 2025.
-
Enhancing Spatial Reasoning in Multimodal Large Language Models through Reasoning-based Segmentation
Authors:
Zhenhua Ning,
Zhuotao Tian,
Shaoshuai Shi,
Guangming Lu,
Daojing He,
Wenjie Pei,
Li Jiang
Abstract:
Recent advances in point cloud perception have demonstrated remarkable progress in scene understanding through vision-language alignment leveraging large language models (LLMs). However, existing methods may still encounter challenges in handling complex instructions that require accurate spatial reasoning, even if the 3D point cloud data provides detailed spatial cues such as size and position fo…
▽ More
Recent advances in point cloud perception have demonstrated remarkable progress in scene understanding through vision-language alignment leveraging large language models (LLMs). However, existing methods may still encounter challenges in handling complex instructions that require accurate spatial reasoning, even if the 3D point cloud data provides detailed spatial cues such as size and position for identifying the targets. To tackle this issue, we propose Relevant Reasoning Segmentation (R$^2$S), a reasoning-based segmentation framework. The framework emulates human cognitive processes by decomposing spatial reasoning into two sequential stages: first identifying relevant elements, then processing instructions guided by their associated visual priors. Furthermore, acknowledging the inadequacy of existing datasets in complex reasoning tasks, we introduce 3D ReasonSeg, a reasoning-based segmentation dataset comprising 25,185 training samples and 3,966 validation samples with precise annotations. Both quantitative and qualitative experiments demonstrate that the R$^2$S and 3D ReasonSeg effectively endow 3D point cloud perception with stronger spatial reasoning capabilities, and we hope that they can serve as a new baseline and benchmark for future work.
△ Less
Submitted 29 June, 2025;
originally announced June 2025.
-
Hierarchical Corpus-View-Category Refinement for Carotid Plaque Risk Grading in Ultrasound
Authors:
Zhiyuan Zhu,
Jian Wang,
Yong Jiang,
Tong Han,
Yuhao Huang,
Ang Zhang,
Kaiwen Yang,
Mingyuan Luo,
Zhe Liu,
Yaofei Duan,
Dong Ni,
Tianhong Tang,
Xin Yang
Abstract:
Accurate carotid plaque grading (CPG) is vital to assess the risk of cardiovascular and cerebrovascular diseases. Due to the small size and high intra-class variability of plaque, CPG is commonly evaluated using a combination of transverse and longitudinal ultrasound views in clinical practice. However, most existing deep learning-based multi-view classification methods focus on feature fusion acr…
▽ More
Accurate carotid plaque grading (CPG) is vital to assess the risk of cardiovascular and cerebrovascular diseases. Due to the small size and high intra-class variability of plaque, CPG is commonly evaluated using a combination of transverse and longitudinal ultrasound views in clinical practice. However, most existing deep learning-based multi-view classification methods focus on feature fusion across different views, neglecting the importance of representation learning and the difference in class features. To address these issues, we propose a novel Corpus-View-Category Refinement Framework (CVC-RF) that processes information from Corpus-, View-, and Category-levels, enhancing model performance. Our contribution is four-fold. First, to the best of our knowledge, we are the foremost deep learning-based method for CPG according to the latest Carotid Plaque-RADS guidelines. Second, we propose a novel center-memory contrastive loss, which enhances the network's global modeling capability by comparing with representative cluster centers and diverse negative samples at the Corpus level. Third, we design a cascaded down-sampling attention module to fuse multi-scale information and achieve implicit feature interaction at the View level. Finally, a parameter-free mixture-of-experts weighting strategy is introduced to leverage class clustering knowledge to weight different experts, enabling feature decoupling at the Category level. Experimental results indicate that CVC-RF effectively models global features via multi-level refinement, achieving state-of-the-art performance in the challenging CPG task.
△ Less
Submitted 29 June, 2025;
originally announced June 2025.
-
Existence and Nonexistence of Extremals for Trudinger-Moser inequalities with $L^p$ type perturbation on any bounded planar domains
Authors:
Lu Chen,
Rou Jiang,
Guozhen Lu,
Maochun Zhu
Abstract:
In this study, we investigate the perturbed Trudinger-Moser inequalities as follows:\[ S_Ω(λ,p)=\sup_{u\in H_{0}^{1}(Ω),\Vert\nabla u\Vert _{L^{2}\left( Ω\right) }\leq 1}\int_Ω\left( e^{4πu^{2}}-λ|u|^{p}\right) dx, \] where $1\leq p<\infty$ and $Ω$ is a bounded domain in $\mathbb{R}^2$. Our results demonstrate that there exists a threshold $λ^{\ast}(p)>0$ such that $S_Ω(λ,p)$ is attainable if…
▽ More
In this study, we investigate the perturbed Trudinger-Moser inequalities as follows:\[ S_Ω(λ,p)=\sup_{u\in H_{0}^{1}(Ω),\Vert\nabla u\Vert _{L^{2}\left( Ω\right) }\leq 1}\int_Ω\left( e^{4πu^{2}}-λ|u|^{p}\right) dx, \] where $1\leq p<\infty$ and $Ω$ is a bounded domain in $\mathbb{R}^2$. Our results demonstrate that there exists a threshold $λ^{\ast}(p)>0$ such that $S_Ω(λ,p)$ is attainable if $λ<λ^{\ast}(p)$, but unattainable if $λ>λ^{\ast}(p)$ when $p\in[1,2]$. For $p>2$, however, we show that $S_Ω(λ,p)$ is always attainable for any $λ\in \mathbb{R}$. These results are achieved through a refined blow-up analysis, which allow us to establish a sharp Dirichlet energy expansion formula for sequences of solutions to the corresponding Euler-Lagrange equations. The asymmetric nature of our problem poses significant challenges to our analysis. To address these, we will establish an appropriate comparison principle between radial and non-radial solutions of the associated Euler-Lagrange equations. Our study establishes a complete characterization of how $L^p$-type perturbations influence the existence of extremals for critical Trudinger-Moser inequalities on any bounded planar domains, this extends the classical Brezis-Nirenberg problem framework to the two-dimensional settings.
△ Less
Submitted 28 June, 2025;
originally announced June 2025.
-
Average quantile regression: a new non-mean regression model and coherent risk measure
Authors:
Rong Jiang,
M. C. Jones,
Keming Yu,
Jiangfeng Wang
Abstract:
Regression models that go beyond the mean, alongside coherent risk measures, have been important tools in modern data analysis. This paper introduces the innovative concept of Average Quantile Regression (AQR), which is smooth at the quantile-like level, comonotonically additive, and explicitly accounts for the severity of tail losses relative to quantile regression. AQR serves as a versatile regr…
▽ More
Regression models that go beyond the mean, alongside coherent risk measures, have been important tools in modern data analysis. This paper introduces the innovative concept of Average Quantile Regression (AQR), which is smooth at the quantile-like level, comonotonically additive, and explicitly accounts for the severity of tail losses relative to quantile regression. AQR serves as a versatile regression model capable of describing distributional information across all positions, akin to quantile regression, yet offering enhanced interpretability compared to expectiles. Numerous traditional regression models and coherent risk measures can be regarded as special cases of AQR. As a flexible non-parametric regression model, AQR demonstrates outstanding performance in analyzing high-dimensional and large datasets, particularly those generated by distributed systems, and provides a convenient framework for their statistical analysis. The corresponding estimators are rigorously derived, and their asymptotic properties are thoroughly developed. In a risk management context, the case study confirms AQR's effectiveness in risk assessment and portfolio optimization.
△ Less
Submitted 28 June, 2025;
originally announced June 2025.
-
MARBLE: A Hard Benchmark for Multimodal Spatial Reasoning and Planning
Authors:
Yulun Jiang,
Yekun Chai,
Maria Brbić,
Michael Moor
Abstract:
The ability to process information from multiple modalities and to reason through it step-by-step remains a critical challenge in advancing artificial intelligence. However, existing reasoning benchmarks focus on text-only reasoning, or employ multimodal questions that can be answered by directly retrieving information from a non-text modality. Thus, complex reasoning remains poorly understood in…
▽ More
The ability to process information from multiple modalities and to reason through it step-by-step remains a critical challenge in advancing artificial intelligence. However, existing reasoning benchmarks focus on text-only reasoning, or employ multimodal questions that can be answered by directly retrieving information from a non-text modality. Thus, complex reasoning remains poorly understood in multimodal domains. Here, we present MARBLE, a challenging multimodal reasoning benchmark that is designed to scrutinize multimodal language models (MLLMs) in their ability to carefully reason step-by-step through complex multimodal problems and environments. MARBLE is composed of two highly challenging tasks, M-Portal and M-Cube, that require the crafting and understanding of multistep plans under spatial, visual, and physical constraints. We find that current MLLMs perform poorly on MARBLE -- all the 12 advanced models obtain near-random performance on M-Portal and 0% accuracy on M-Cube. Only in simplified subtasks some models outperform the random baseline, indicating that complex reasoning is still a challenge for existing MLLMs. Moreover, we show that perception remains a bottleneck, where MLLMs occasionally fail to extract information from the visual inputs. By shedding a light on the limitations of MLLMs, we hope that MARBLE will spur the development of the next generation of models with the ability to reason and plan across many, multimodal reasoning steps.
△ Less
Submitted 28 June, 2025;
originally announced June 2025.
-
Quantum Gravity Corrections to the Scalar Quasi-Normal Modes in Near-Extremal Reissener-Nordström Black Holes
Authors:
Zheng Jiang,
Jun Nian,
Caiying Shao,
Yu Tian,
Hongbao Zhang
Abstract:
We investigate quantum corrections to scalar quasi-normal modes (QNMs) in the near-extremal Reissner-Nordström black hole background with quantum correction in the near-horizon AdS$_2\times \mathrm{S}^2$ region. By performing a dimensional reduction, we obtain an effective Jackiw-Teitelboim (JT) gravity theory, whose quantum fluctuations are captured by the Schwarzian action. Using path integral t…
▽ More
We investigate quantum corrections to scalar quasi-normal modes (QNMs) in the near-extremal Reissner-Nordström black hole background with quantum correction in the near-horizon AdS$_2\times \mathrm{S}^2$ region. By performing a dimensional reduction, we obtain an effective Jackiw-Teitelboim (JT) gravity theory, whose quantum fluctuations are captured by the Schwarzian action. Using path integral techniques, we derive the quantum-corrected scalar field equation, which modifies the effective potential governing the QNMs. These corrections are extended from the near-horizon region to the full spacetime via a matching procedure. We compute the corrected QNMs using both the third-order WKB method and the Prony method and find consistent results. Our analysis reveals that quantum corrections can lead to substantial shifts in the real parts of QNM frequencies, particularly for small-mass or near-extremal black holes, while the imaginary parts remain relatively stable. This suggests that quantum gravity effects may leave observable imprints on black hole perturbation spectra, which could be potentially relevant for primordial or microscopic black holes.
△ Less
Submitted 28 June, 2025;
originally announced June 2025.
-
VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding
Authors:
Minchao Jiang,
Shunyu Jia,
Jiaming Gu,
Xiaoyuan Lu,
Guangming Zhu,
Anqi Dong,
Liang Zhang
Abstract:
3D Gaussian Splatting (3DGS) has become horsepower in high-quality, real-time rendering for novel view synthesis of 3D scenes. However, existing methods focus primarily on geometric and appearance modeling, lacking deeper scene understanding while also incurring high training costs that complicate the originally streamlined differentiable rendering pipeline. To this end, we propose VoteSplat, a no…
▽ More
3D Gaussian Splatting (3DGS) has become horsepower in high-quality, real-time rendering for novel view synthesis of 3D scenes. However, existing methods focus primarily on geometric and appearance modeling, lacking deeper scene understanding while also incurring high training costs that complicate the originally streamlined differentiable rendering pipeline. To this end, we propose VoteSplat, a novel 3D scene understanding framework that integrates Hough voting with 3DGS. Specifically, Segment Anything Model (SAM) is utilized for instance segmentation, extracting objects, and generating 2D vote maps. We then embed spatial offset vectors into Gaussian primitives. These offsets construct 3D spatial votes by associating them with 2D image votes, while depth distortion constraints refine localization along the depth axis. For open-vocabulary object localization, VoteSplat maps 2D image semantics to 3D point clouds via voting points, reducing training costs associated with high-dimensional CLIP features while preserving semantic unambiguity. Extensive experiments demonstrate effectiveness of VoteSplat in open-vocabulary 3D instance localization, 3D point cloud understanding, click-based 3D object localization, hierarchical segmentation, and ablation studies. Our code is available at https://sy-ja.github.io/votesplat/
△ Less
Submitted 28 June, 2025;
originally announced June 2025.
-
Part Segmentation and Motion Estimation for Articulated Objects with Dynamic 3D Gaussians
Authors:
Jun-Jee Chao,
Qingyuan Jiang,
Volkan Isler
Abstract:
Part segmentation and motion estimation are two fundamental problems for articulated object motion analysis. In this paper, we present a method to solve these two problems jointly from a sequence of observed point clouds of a single articulated object. The main challenge in our problem setting is that the point clouds are not assumed to be generated by a fixed set of moving points. Instead, each p…
▽ More
Part segmentation and motion estimation are two fundamental problems for articulated object motion analysis. In this paper, we present a method to solve these two problems jointly from a sequence of observed point clouds of a single articulated object. The main challenge in our problem setting is that the point clouds are not assumed to be generated by a fixed set of moving points. Instead, each point cloud in the sequence could be an arbitrary sampling of the object surface at that particular time step. Such scenarios occur when the object undergoes major occlusions, or if the dataset is collected using measurements from multiple sensors asynchronously. In these scenarios, methods that rely on tracking point correspondences are not appropriate. We present an alternative approach based on a compact but effective representation where we represent the object as a collection of simple building blocks modeled as 3D Gaussians. We parameterize the Gaussians with time-dependent rotations, translations, and scales that are shared across all time steps. With our representation, part segmentation can be achieved by building correspondences between the observed points and the Gaussians. Moreover, the transformation of each point across time can be obtained by following the poses of the assigned Gaussian (even when the point is not observed). Experiments show that our method outperforms existing methods that solely rely on finding point correspondences. Additionally, we extend existing datasets to emulate real-world scenarios by considering viewpoint occlusions. We further demonstrate that our method is more robust to missing points as compared to existing approaches on these challenging datasets, even when some parts are completely occluded in some time-steps. Notably, our part segmentation performance outperforms the state-of-the-art method by 13% on point clouds with occlusions.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
LightBSR: Towards Lightweight Blind Super-Resolution via Discriminative Implicit Degradation Representation Learning
Authors:
Jiang Yuan,
JI Ma,
Bo Wang,
Guanzhou Ke,
Weiming Hu
Abstract:
Implicit degradation estimation-based blind super-resolution (IDE-BSR) hinges on extracting the implicit degradation representation (IDR) of the LR image and adapting it to LR image features to guide HR detail restoration. Although IDE-BSR has shown potential in dealing with noise interference and complex degradations, existing methods ignore the importance of IDR discriminability for BSR and inst…
▽ More
Implicit degradation estimation-based blind super-resolution (IDE-BSR) hinges on extracting the implicit degradation representation (IDR) of the LR image and adapting it to LR image features to guide HR detail restoration. Although IDE-BSR has shown potential in dealing with noise interference and complex degradations, existing methods ignore the importance of IDR discriminability for BSR and instead over-complicate the adaptation process to improve effect, resulting in a significant increase in the model's parameters and computations. In this paper, we focus on the discriminability optimization of IDR and propose a new powerful and lightweight BSR model termed LightBSR. Specifically, we employ a knowledge distillation-based learning framework. We first introduce a well-designed degradation-prior-constrained contrastive learning technique during teacher stage to make the model more focused on distinguishing different degradation types. Then we utilize a feature alignment technique to transfer the degradation-related knowledge acquired by the teacher to the student for practical inferencing. Extensive experiments demonstrate the effectiveness of IDR discriminability-driven BSR model design. The proposed LightBSR can achieve outstanding performance with minimal complexity across a range of blind SR tasks. Our code is accessible at: https://github.com/MJ-NCEPU/LightBSR.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Prediction of Protein Three-dimensional Structures via a Hardware-Executable Quantum Computing Framework
Authors:
Yuqi Zhang,
Yuxin Yang,
William Martin,
Kingsten Lin,
Zixu Wang,
Cheng-Chang Lu,
Weiwen Jiang,
Ruth Nussinov,
Joseph Loscalzo,
Qiang Guan,
Feixiong Cheng
Abstract:
Accurate prediction of protein active site structures remains a central challenge in structural biology, particularly for short and flexible peptide fragments where conventional methods often fail. Here, we present a quantum computing framework specifically developed for utility-level quantum processors to address this problem. Starting from an amino acid sequence, we formulate the structure predi…
▽ More
Accurate prediction of protein active site structures remains a central challenge in structural biology, particularly for short and flexible peptide fragments where conventional methods often fail. Here, we present a quantum computing framework specifically developed for utility-level quantum processors to address this problem. Starting from an amino acid sequence, we formulate the structure prediction task as a ground-state energy minimization problem using the Variational Quantum Eigensolver (VQE). Amino acid connectivity is encoded on a tetrahedral lattice model, and structural constraints-including steric, geometric, and chirality terms-are mapped into a problem-specific Hamiltonian expressed as sparse Pauli operators. The optimization is executed via a two-stage architecture separating energy estimation and measurement decoding, allowing noise mitigation under realistic quantum device conditions. We evaluate the framework on 23 randomly selected real protein fragments from the PDBbind dataset, as well as 7 real fragments from proteins with therapeutic potential, and run the experiments on the IBM-Cleveland Clinic quantum processor. Structural predictions are benchmarked against AlphaFold3 (AF3) using identical postprocessing and docking procedures. Our quantum method outperformed AF3 in both RMSD (Root-Mean-Square Deviation) and docking efficacy. This work demonstrates, for the first time, a complete end-to-end pipeline for biologically relevant structure prediction on real quantum hardware, highlighting its engineering feasibility and practical advantage over existing classical and deep learning approaches.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
A scanning resonator for probing quantum coherent devices
Authors:
Jared Gibson,
Zhanzhi Jiang,
Angela Kou
Abstract:
Superconducting resonators with high quality factors are extremely sensitive detectors of the complex impedance of materials and devices coupled to them. This capability has been used to measure losses in multiple different materials and, in the case of circuit quantum electrodynamics (circuit QED), has been used to measure the coherent evolution of multiple different types of qubits. Here, we rep…
▽ More
Superconducting resonators with high quality factors are extremely sensitive detectors of the complex impedance of materials and devices coupled to them. This capability has been used to measure losses in multiple different materials and, in the case of circuit quantum electrodynamics (circuit QED), has been used to measure the coherent evolution of multiple different types of qubits. Here, we report on the implementation of a scanning resonator for probing quantum coherent devices. Our scanning setup enables tunable coherent coupling to systems of interest without the need for fabricating on-chip superconducting resonators. We measure the internal quality factor of our resonator sensor in the single-photon regime to be > 10000 and demonstrate capacitive imaging using our sensor with zeptoFarad sensitivity and micron spatial resolution at milliKelvin temperatures. We then use our setup to characterize the energy spectrum and coherence times of multiple transmon qubits with no on-chip readout circuitry. Our work introduces a new tool for using circuit QED to measure existing and proposed qubit platforms.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Integrated Multimodal Sensing and Communication: Challenges, Technologies, and Architectures
Authors:
Yubo Peng,
Luping Xiang,
Kun Yang,
Feibo Jiang,
Kezhi Wang,
Christos Masouros
Abstract:
The evolution towards 6G networks requires the intelligent integration of communication and sensing capabilities to support diverse and complex applications, such as autonomous driving and immersive services. However, existing integrated sensing and communication (ISAC) systems predominantly rely on single-modal sensors as primary participants, which leads to a limited representation of environmen…
▽ More
The evolution towards 6G networks requires the intelligent integration of communication and sensing capabilities to support diverse and complex applications, such as autonomous driving and immersive services. However, existing integrated sensing and communication (ISAC) systems predominantly rely on single-modal sensors as primary participants, which leads to a limited representation of environmental features and significant performance bottlenecks under the emerging requirements of 6G applications. This limitation motivates a paradigm shift from single-modal to multimodal ISAC. In this article, we first analyze the key challenges in realizing multimodal ISAC, including the fusion of heterogeneous multimodal data, the high communication overhead among distributed sensors, and the design of efficient and scalable system architectures. We then introduce several enabling technologies, such as large AI models, semantic communication, and multi-agent systems, that hold promise for addressing these challenges. To operationalize these technologies, we zoom into three architectural paradigms: fusion-based multimodal ISAC (F-MAC), interaction-based multimodal ISAC (I-MAC), and relay-based multimodal ISAC (R-MAC), each tailored to organize devices and modalities for efficient collaboration in different scenarios. Thereafter, a case study is presented based on the F-MAC scheme, demonstrating that the scheme achieves more comprehensive sensing and improves sensing accuracy by approximately 80% compared to conventional single-modal ISAC systems. Finally, we discuss several open issues to be addressed in the future.
△ Less
Submitted 25 June, 2025;
originally announced June 2025.
-
Zero-Shot EEG-to-Gait Decoding via Phase-Aware Representation Learning
Authors:
Xi Fu,
Weibang Jiang,
Rui Liu,
Gernot R. Müller-Putz,
Cuntai Guan
Abstract:
Accurate decoding of lower-limb motion from EEG signals is essential for advancing brain-computer interface (BCI) applications in movement intent recognition and control. However, challenges persist in achieving causal, phase-consistent predictions and in modeling both inter- and intra-subject variability. To address these issues, we propose NeuroDyGait, a domain-generalizable EEG-to-motion decodi…
▽ More
Accurate decoding of lower-limb motion from EEG signals is essential for advancing brain-computer interface (BCI) applications in movement intent recognition and control. However, challenges persist in achieving causal, phase-consistent predictions and in modeling both inter- and intra-subject variability. To address these issues, we propose NeuroDyGait, a domain-generalizable EEG-to-motion decoding framework that leverages structured contrastive representation learning and relational domain modeling. The proposed method employs relative contrastive learning to achieve semantic alignment between EEG and motion embeddings. Furthermore, a multi-cycle gait reconstruction objective is introduced to enforce temporal coherence and maintain biomechanical consistency. To promote inter-session generalization, during fine-tuning, a domain dynamic decoding mechanism adaptively assigns session-specific prediction heads and learns to mix their outputs based on inter-session relationships. NeuroDyGait enables zero-shot motion prediction for unseen individuals without requiring adaptation and achieves superior performance in cross-subject gait decoding on benchmark datasets. Additionally, it demonstrates strong phase-detection capabilities even without explicit phase supervision during training. These findings highlight the potential of relational domain learning in enabling scalable, target-free deployment of BCIs.
△ Less
Submitted 24 June, 2025;
originally announced June 2025.
-
The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements
Authors:
Bingchen Zhao,
Despoina Magka,
Minqi Jiang,
Xian Li,
Roberta Raileanu,
Tatiana Shavrina,
Jean-Christophe Gagnon-Audet,
Kelvin Niu,
Shagun Sodhani,
Michael Shvartsman,
Andrei Lupu,
Alisia Lupidi,
Edan Toledo,
Karen Hambardzumyan,
Martin Josifoski,
Thomas Foster,
Lucia Cipolina-Kun,
Abhishek Charnalia,
Derek Dunfield,
Alexander H. Miller,
Oisin Mac Aodha,
Jakob Foerster,
Yoram Bachrach
Abstract:
Rapid advancements in large language models (LLMs) have the potential to assist in scientific progress. A critical capability toward this endeavor is the ability to reproduce existing work. To evaluate the ability of AI agents to reproduce results in an active research area, we introduce the Automated LLM Speedrunning Benchmark, leveraging the research community contributions on the NanoGPT speedr…
▽ More
Rapid advancements in large language models (LLMs) have the potential to assist in scientific progress. A critical capability toward this endeavor is the ability to reproduce existing work. To evaluate the ability of AI agents to reproduce results in an active research area, we introduce the Automated LLM Speedrunning Benchmark, leveraging the research community contributions on the NanoGPT speedrun, a competition to train a GPT-2 model in the shortest time. Each of the 19 speedrun tasks provides the agent with the previous records training script, optionally paired with one of three hint formats, ranging from pseudocode to paper-like descriptions of the new records improvements. Records execute quickly by design and speedrun improvements encompass diverse code-level changes, ranging from high-level algorithmic advancements to hardware-aware optimizations. These features make the benchmark both accessible and realistic for the frontier problem of improving LLM training. We find that recent reasoning LLMs combined with SoTA scaffolds struggle to reimplement already-known innovations in our benchmark, even when given detailed hints. Our benchmark thus provides a simple, non-saturated measure of an LLMs ability to automate scientific reproduction, a necessary (but not sufficient) skill for an autonomous research agent.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Reward Balancing Revisited: Enhancing Offline Reinforcement Learning for Recommender Systems
Authors:
Wenzheng Shu,
Yanxiang Zeng,
Yongxiang Tang,
Teng Sha,
Ning Luo,
Yanhua Cheng,
Xialong Liu,
Fan Zhou,
Peng Jiang
Abstract:
Offline reinforcement learning (RL) has emerged as a prevalent and effective methodology for real-world recommender systems, enabling learning policies from historical data and capturing user preferences. In offline RL, reward shaping encounters significant challenges, with past efforts to incorporate prior strategies for uncertainty to improve world models or penalize underexplored state-action p…
▽ More
Offline reinforcement learning (RL) has emerged as a prevalent and effective methodology for real-world recommender systems, enabling learning policies from historical data and capturing user preferences. In offline RL, reward shaping encounters significant challenges, with past efforts to incorporate prior strategies for uncertainty to improve world models or penalize underexplored state-action pairs. Despite these efforts, a critical gap remains: the simultaneous balancing of intrinsic biases in world models and the diversity of policy recommendations. To address this limitation, we present an innovative offline RL framework termed Reallocated Reward for Recommender Systems (R3S). By integrating inherent model uncertainty to tackle the intrinsic fluctuations in reward predictions, we boost diversity for decision-making to align with a more interactive paradigm, incorporating extra penalizers with decay that deter actions leading to diminished state variety at both local and global scales. The experimental results demonstrate that R3S improves the accuracy of world models and efficiently harmonizes the heterogeneous preferences of the users.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
BézierGS: Dynamic Urban Scene Reconstruction with Bézier Curve Gaussian Splatting
Authors:
Zipei Ma,
Junzhe Jiang,
Yurui Chen,
Li Zhang
Abstract:
The realistic reconstruction of street scenes is critical for developing real-world simulators in autonomous driving. Most existing methods rely on object pose annotations, using these poses to reconstruct dynamic objects and move them during the rendering process. This dependence on high-precision object annotations limits large-scale and extensive scene reconstruction. To address this challenge,…
▽ More
The realistic reconstruction of street scenes is critical for developing real-world simulators in autonomous driving. Most existing methods rely on object pose annotations, using these poses to reconstruct dynamic objects and move them during the rendering process. This dependence on high-precision object annotations limits large-scale and extensive scene reconstruction. To address this challenge, we propose Bézier curve Gaussian splatting (BézierGS), which represents the motion trajectories of dynamic objects using learnable Bézier curves. This approach fully leverages the temporal information of dynamic objects and, through learnable curve modeling, automatically corrects pose errors. By introducing additional supervision on dynamic object rendering and inter-curve consistency constraints, we achieve reasonable and accurate separation and reconstruction of scene elements. Extensive experiments on the Waymo Open Dataset and the nuPlan benchmark demonstrate that BézierGS outperforms state-of-the-art alternatives in both dynamic and static scene components reconstruction and novel view synthesis.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Nonlinear Power Amplifier-Resilient Cell-Free Massive MIMO: A Joint Optimization Approach
Authors:
Wei Jiang,
Hans D. Schotten
Abstract:
This letter analyzes the effects of power amplifiers (PAs) on the downlink of cell-free massive MIMO systems. We model signal transmission incorporating nonlinear PA distortion and derive a unified spectral efficiency (SE) expression applicable to arbitrary precoding schemes. To combat PA-induced performance degradation, a joint optimization approach for user association and max-min power control…
▽ More
This letter analyzes the effects of power amplifiers (PAs) on the downlink of cell-free massive MIMO systems. We model signal transmission incorporating nonlinear PA distortion and derive a unified spectral efficiency (SE) expression applicable to arbitrary precoding schemes. To combat PA-induced performance degradation, a joint optimization approach for user association and max-min power control is proposed. Furthermore, a low-complexity alternative is developed to approximate the joint optimization with reduced computational overhead. Simulations validate the analysis and demonstrate significant performance gains of the proposed approaches over conventional techniques.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Updated measurement of $CP$ violation and polarisation in $B^0_s \rightarrow J/ψ\overline{K}{}^{*}\kern-1pt(892)^{0}$ decays
Authors:
LHCb collaboration,
R. Aaij,
A. S. W. Abdelmotteleb,
C. Abellan Beteta,
F. Abudinén,
T. Ackernley,
A. A. Adefisoye,
B. Adeva,
M. Adinolfi,
P. Adlarson,
C. Agapopoulou,
C. A. Aidala,
Z. Ajaltouni,
S. Akar,
K. Akiba,
P. Albicocco,
J. Albrecht,
R. Aleksiejunas,
F. Alessio,
Z. Aliouche,
P. Alvarez Cartelle,
R. Amalric,
S. Amato,
J. L. Amey,
Y. Amhis
, et al. (1168 additional authors not shown)
Abstract:
A time-integrated angular analysis of the decay $B^0_s \rightarrow J/ψ\overline{K}{}^{*}\kern-1pt(892)^{0}$, with $J/ψ\rightarrow μ^{+} μ^{-}$ and $\overline{K}{}^{*}\kern-1pt(892)^{0} \rightarrow K^{-} π^{+}$, is presented. The analysis employs a sample of proton-proton collision data collected by the LHCb experiment during 2015-2018 at a centre-of-mass energy of $13 \text{TeV}$, corresponding to…
▽ More
A time-integrated angular analysis of the decay $B^0_s \rightarrow J/ψ\overline{K}{}^{*}\kern-1pt(892)^{0}$, with $J/ψ\rightarrow μ^{+} μ^{-}$ and $\overline{K}{}^{*}\kern-1pt(892)^{0} \rightarrow K^{-} π^{+}$, is presented. The analysis employs a sample of proton-proton collision data collected by the LHCb experiment during 2015-2018 at a centre-of-mass energy of $13 \text{TeV}$, corresponding to an integrated luminosity of $6 \text{fb}^{-1}$. A simultaneous maximum-likelihood fit is performed to the angular distributions in bins of the $K^{-} π^{+}$ mass. This fit yields measurements of the $CP$-averaged polarisation fractions and $CP$ asymmetries for the P-wave component of the $K^{-} π^{+}$ system. The longitudinal and parallel polarisation fractions are determined to be $f_{0} = 0.534 \pm 0.012 \pm 0.009$ and $f_{\parallel} = 0.211 \pm 0.014 \pm 0.005$, respectively, where the first uncertainty is statistical and the second is systematic. The $CP$ asymmetries are measured with $3$-$7\%$ precision and are found to be consistent with zero. These measurements, along with an updated determination of the branching fraction relative to the $B^0 \rightarrow J/ψK^{*0}$ decay, are combined with previous LHCb results, providing the most precise values for these observables to date.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Universal Retrieval for Multimodal Trajectory Modeling
Authors:
Xuan Zhang,
Ziyan Jiang,
Rui Meng,
Yifei Leng,
Zhenbang Xiao,
Zora Zhiruo Wang,
Yanyi Shang,
Dehan Kong
Abstract:
Trajectory data, capturing human actions and environmental states across various modalities, holds significant potential for enhancing AI agent capabilities, particularly in GUI environments. However, how to model the representation of trajectory-level data presents a significant challenge that has not been systematically addressed amid explosive trajectory data growth. In this work, we introduce…
▽ More
Trajectory data, capturing human actions and environmental states across various modalities, holds significant potential for enhancing AI agent capabilities, particularly in GUI environments. However, how to model the representation of trajectory-level data presents a significant challenge that has not been systematically addressed amid explosive trajectory data growth. In this work, we introduce Multimodal Trajectory Retrieval, bridging the gap between universal retrieval and agent-centric trajectory modeling. We construct the Unified Agent Trajectory Dataset (UATD) from annotated demonstrations and states across diverse real-world scenarios. Based on this, we present GAE-Bench, a benchmark containing a large number of trajectory-based retrieval pairs. In addition, we propose GAE-Retriever, a multimodal retrieval framework that adopts vision-language models and incorporates optimized contrastive learning through a token selection and the GradCache mechanism. Comprehensive evaluations across multiple datasets show that GAE-Retriever consistently outperforms strong baselines in retrieval recall, highlighting its effectiveness in advancing multimodal trajectory retrieval.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
UniCA: Adapting Time Series Foundation Model to General Covariate-Aware Forecasting
Authors:
Lu Han,
Yu Liu,
Qiwen Deng,
Jian Jiang,
Yinbo Sun,
Zhe Yu,
Binfeng Wang,
Xingyu Lu,
Lintao Ma,
Han-Jia Ye,
De-Chuan Zhan
Abstract:
Time Series Foundation Models (TSFMs) have achieved remarkable success through large-scale pretraining. However, their design primarily targets real-valued series, limiting their ability to handle general forecasting tasks involving diverse and often heterogeneous covariates--such as categorical variables and multimodal data (e.g., images, text)--which are typically task-specific and difficult to…
▽ More
Time Series Foundation Models (TSFMs) have achieved remarkable success through large-scale pretraining. However, their design primarily targets real-valued series, limiting their ability to handle general forecasting tasks involving diverse and often heterogeneous covariates--such as categorical variables and multimodal data (e.g., images, text)--which are typically task-specific and difficult to leverage during pretraining. To address this gap, we propose Unified Covariate Adaptation (UniCA), a framework to bridge TSFMs with general covariate-aware forecasting. UniCA first performs covariate homogenization to transform heterogeneous covariates into high-level homogeneous series representations and then fuses them via a unified attention-based fusion mechanism. UniCA is compatible and universal for adaptation with both homogeneous and heterogeneous covariates, incorporating extra covariate information while preserving the generalization ability of TSFMs.Extensive experiments on multiple unimodal and multimodal covariate-aware forecasting benchmarks demonstrate the superiority of UniCA, highlighting the promise of covariate-aware TSFM adaptation in real-world forecasting scenarios. Codes are released on https://github.com/hanlu-nju/UniCA.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Heterogeneous Massive MIMO: A Cost-Efficient Technique for Uniform Service in Cellular Networks
Authors:
Wei Jiang,
Hans D. Schotten
Abstract:
Massive multi-input multi-output (MIMO) has evolved along two tracks: cellular and cell-free, each with unique advantages and limitations. The cellular approach suffers from worse user spectral efficiency at cell edges, whereas the cell-free approach incurs high implementation costs due to a large-scale distributed infrastructure. This paper introduces a novel networking paradigm, termed heterogen…
▽ More
Massive multi-input multi-output (MIMO) has evolved along two tracks: cellular and cell-free, each with unique advantages and limitations. The cellular approach suffers from worse user spectral efficiency at cell edges, whereas the cell-free approach incurs high implementation costs due to a large-scale distributed infrastructure. This paper introduces a novel networking paradigm, termed heterogeneous massive MIMO (HmMIMO), which seamlessly integrates co-located and distributed antennas. Differing from two conventional paradigms, HmMIMO remains a base station with a large antenna array at the center of each cell, aided by distributed antennas deployed at cell edges. Our findings demonstrate that this paradigm achieves a favorable trade-off between performance and implementation complexity.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model
Authors:
Shuhan Tan,
John Lambert,
Hong Jeon,
Sakshum Kulshrestha,
Yijing Bai,
Jing Luo,
Dragomir Anguelov,
Mingxing Tan,
Chiyu Max Jiang
Abstract:
The goal of traffic simulation is to augment a potentially limited amount of manually-driven miles that is available for testing and validation, with a much larger amount of simulated synthetic miles. The culmination of this vision would be a generative simulated city, where given a map of the city and an autonomous vehicle (AV) software stack, the simulator can seamlessly simulate the trip from p…
▽ More
The goal of traffic simulation is to augment a potentially limited amount of manually-driven miles that is available for testing and validation, with a much larger amount of simulated synthetic miles. The culmination of this vision would be a generative simulated city, where given a map of the city and an autonomous vehicle (AV) software stack, the simulator can seamlessly simulate the trip from point A to point B by populating the city around the AV and controlling all aspects of the scene, from animating the dynamic agents (e.g., vehicles, pedestrians) to controlling the traffic light states. We refer to this vision as CitySim, which requires an agglomeration of simulation technologies: scene generation to populate the initial scene, agent behavior modeling to animate the scene, occlusion reasoning, dynamic scene generation to seamlessly spawn and remove agents, and environment simulation for factors such as traffic lights. While some key technologies have been separately studied in various works, others such as dynamic scene generation and environment simulation have received less attention in the research community. We propose SceneDiffuser++, the first end-to-end generative world model trained on a single loss function capable of point A-to-B simulation on a city scale integrating all the requirements above. We demonstrate the city-scale traffic simulation capability of SceneDiffuser++ and study its superior realism under long simulation conditions. We evaluate the simulation quality on an augmented version of the Waymo Open Motion Dataset (WOMD) with larger map regions to support trip-level simulation.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Redundant Array Computation Elimination
Authors:
Zixuan Wang,
Liang Yuan,
Xianmeng Jiang,
Kun Li,
Junmin Xiao,
Yunquan Zhang
Abstract:
Redundancy elimination is a key optimization direction, and loop nests are the main optimization target in modern compilers. Previous work on redundancy elimination of array computations in loop nests lacks universality. These approaches either focus on specific computation patterns or fail to recognize redundancies with complex structures. This paper proposes RACE (Redundant Array Computation Eli…
▽ More
Redundancy elimination is a key optimization direction, and loop nests are the main optimization target in modern compilers. Previous work on redundancy elimination of array computations in loop nests lacks universality. These approaches either focus on specific computation patterns or fail to recognize redundancies with complex structures. This paper proposes RACE (Redundant Array Computation Elimination), a more general redundancy elimination technique. RACE utilizes a novel two-level scheme to identify the data reuse between array references and the computation redundancies between expressions. It traverses the expression trees in loop nests to detect redundancies hierarchically in linear time and generates efficient code with optimized auxiliary arrays that store redundant computation results. Furthermore, RACE supports the expression reassociation with various aggressive strategies to improve the redundancy opportunities. Experimental results demonstrate the effectiveness of RACE.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Optimal Return-to-Go Guided Decision Transformer for Auto-Bidding in Advertisement
Authors:
Hao Jiang,
Yongxiang Tang,
Yanxiang Zeng,
Pengjia Yuan,
Yanhua Cheng,
Teng Sha,
Xialong Liu,
Peng Jiang
Abstract:
In the realm of online advertising, advertisers partake in ad auctions to obtain advertising slots, frequently taking advantage of auto-bidding tools provided by demand-side platforms. To improve the automation of these bidding systems, we adopt generative models, namely the Decision Transformer (DT), to tackle the difficulties inherent in automated bidding. Applying the Decision Transformer to th…
▽ More
In the realm of online advertising, advertisers partake in ad auctions to obtain advertising slots, frequently taking advantage of auto-bidding tools provided by demand-side platforms. To improve the automation of these bidding systems, we adopt generative models, namely the Decision Transformer (DT), to tackle the difficulties inherent in automated bidding. Applying the Decision Transformer to the auto-bidding task enables a unified approach to sequential modeling, which efficiently overcomes short-sightedness by capturing long-term dependencies between past bidding actions and user behavior. Nevertheless, conventional DT has certain drawbacks: (1) DT necessitates a preset return-to-go (RTG) value before generating actions, which is not inherently produced; (2) The policy learned by DT is restricted by its training data, which is consists of mixed-quality trajectories. To address these challenges, we introduce the R* Decision Transformer (R* DT), developed in a three-step process: (1) R DT: Similar to traditional DT, R DT stores actions based on state and RTG value, as well as memorizing the RTG for a given state using the training set; (2) R^ DT: We forecast the highest value (within the training set) of RTG for a given state, deriving a suboptimal policy based on the current state and the forecasted supreme RTG value; (3) R* DT: Based on R^ DT, we generate trajectories and select those with high rewards (using a simulator) to augment our training dataset. This data enhancement has been shown to improve the RTG of trajectories in the training data and gradually leads the suboptimal policy towards optimality. Comprehensive tests on a publicly available bidding dataset validate the R* DT's efficacy and highlight its superiority when dealing with mixed-quality trajectories.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Prediction of A15 Tilt Grain Boundary Structures
Authors:
Wenwen Zou,
Zihan Su,
Juan Zhang,
Kai Jiang
Abstract:
In this work, we present a theoretical method to predict all coincidence site lattice (CSL) tilt grain boundaries (GBs) in A15, especially high-$Σ$ CSL GBs. This method includes a modified Farey diagram (MFD) and a computational framework based on the 3D phase field crystal model. Applied to [001] CSL symmetric tilt grain boundaries (STGBs) in A15, this method identifies building blocks of A15 GBs…
▽ More
In this work, we present a theoretical method to predict all coincidence site lattice (CSL) tilt grain boundaries (GBs) in A15, especially high-$Σ$ CSL GBs. This method includes a modified Farey diagram (MFD) and a computational framework based on the 3D phase field crystal model. Applied to [001] CSL symmetric tilt grain boundaries (STGBs) in A15, this method identifies building blocks of A15 GBs, known as structural units (SUs). The MFD predicts the quantity and proportion of SUs within GBs. The developed computational approach further determines the arrangement of these SUs. The predictive rule reveals the SU arrangement of A15 [001] CSL STGBs.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Exploring Task-Solving Paradigm for Generalized Cross-Domain Face Anti-Spoofing via Reinforcement Fine-Tuning
Authors:
Fangling Jiang,
Qi Li,
Weining Wang,
Gang Wang,
Bing Liu,
Zhenan Sun
Abstract:
Recently the emergence of novel presentation attacks has drawn increasing attention to face anti-spoofing. However, existing methods tend to memorize data patterns from the training set, resulting in poor generalization to unknown attack types across different scenarios and limited interpretability. To address these challenges, this paper presents a reinforcement fine-tuning-based face anti-spoofi…
▽ More
Recently the emergence of novel presentation attacks has drawn increasing attention to face anti-spoofing. However, existing methods tend to memorize data patterns from the training set, resulting in poor generalization to unknown attack types across different scenarios and limited interpretability. To address these challenges, this paper presents a reinforcement fine-tuning-based face anti-spoofing method that stimulates the capabilities of multimodal large language models to think and learn how to solve the anti-spoofing task itself, rather than relying on the memorization of authenticity patterns. We design verifiable class consistent reward and reasoning consistent reward, and employ a GRPO-based optimization strategy to guide the model in exploring reasoning policies from multiple perspectives to maximize expected rewards. As a result, through iterative trial-and-error learning while retaining only high-reward trajectories, the model distills highly generalizable decision-making rules from the extensive solution space to effectively address cross-domain face anti-spoofing tasks. Extensive experimental results demonstrate that our method achieves state-of-the-art cross-domain generalization performance. It generalizes well to diverse unknown attack types in unseen target domains while providing interpretable reasoning for its authenticity decisions without requiring labor-intensive textual annotations for training.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
TUS-REC2024: A Challenge to Reconstruct 3D Freehand Ultrasound Without External Tracker
Authors:
Qi Li,
Shaheer U. Saeed,
Yuliang Huang,
Mingyuan Luo,
Zhongnuo Yan,
Jiongquan Chen,
Xin Yang,
Dong Ni,
Nektarios Winter,
Phuc Nguyen,
Lucas Steinberger,
Caelan Haney,
Yuan Zhao,
Mingjie Jiang,
Bowen Ren,
SiYeoul Lee,
Seonho Kim,
MinKyung Seo,
MinWoo Kim,
Yimeng Dou,
Zhiwei Zhang,
Yin Li,
Tomy Varghese,
Dean C. Barratt,
Matthew J. Clarkson
, et al. (2 additional authors not shown)
Abstract:
Trackerless freehand ultrasound reconstruction aims to reconstruct 3D volumes from sequences of 2D ultrasound images without relying on external tracking systems, offering a low-cost, portable, and widely deployable alternative for volumetric imaging. However, it presents significant challenges, including accurate inter-frame motion estimation, minimisation of drift accumulation over long sequence…
▽ More
Trackerless freehand ultrasound reconstruction aims to reconstruct 3D volumes from sequences of 2D ultrasound images without relying on external tracking systems, offering a low-cost, portable, and widely deployable alternative for volumetric imaging. However, it presents significant challenges, including accurate inter-frame motion estimation, minimisation of drift accumulation over long sequences, and generalisability across scanning protocols. The TUS-REC2024 Challenge was established to benchmark and accelerate progress in trackerless 3D ultrasound reconstruction by providing a publicly available dataset for the first time, along with a baseline model and evaluation framework. The Challenge attracted over 43 registered teams, of which 6 teams submitted 21 valid dockerized solutions. Submitted methods spanned a wide range of algorithmic approaches, including recurrent models, registration-driven volume refinement, attention, and physics-informed models. This paper presents an overview of the Challenge design, summarises the key characteristics of the dataset, provides a concise literature review, introduces the technical details of the underlying methodology working with tracked freehand ultrasound data, and offers a comparative analysis of submitted methods across multiple evaluation metrics. The results highlight both the progress and current limitations of state-of-the-art approaches in this domain, and inform directions for future research. The data, evaluation code, and baseline are publicly available to facilitate ongoing development and reproducibility. As a live and evolving benchmark, this Challenge is designed to be continuously developed and improved. The Challenge was held at MICCAI 2024 and will be organised again at MICCAI 2025, reflecting its growing impact and the sustained commitment to advancing this field.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Do We Really Need GNNs with Explicit Structural Modeling? MLPs Suffice for Language Model Representations
Authors:
Li Zhou,
Hao Jiang,
Junjie Li,
Zefeng Zhao,
Feng Jiang,
Wenyu Chen,
Haizhou Li
Abstract:
Explicit structural information has been proven to be encoded by Graph Neural Networks (GNNs), serving as auxiliary knowledge to enhance model capabilities and improve performance in downstream NLP tasks. However, recent studies indicate that GNNs fail to fully utilize structural information, whereas Multi-Layer Perceptrons (MLPs), despite lacking the message-passing mechanisms inherent to GNNs, e…
▽ More
Explicit structural information has been proven to be encoded by Graph Neural Networks (GNNs), serving as auxiliary knowledge to enhance model capabilities and improve performance in downstream NLP tasks. However, recent studies indicate that GNNs fail to fully utilize structural information, whereas Multi-Layer Perceptrons (MLPs), despite lacking the message-passing mechanisms inherent to GNNs, exhibit a surprising ability in structure-aware tasks. Motivated by these findings, this paper introduces a comprehensive probing framework from an information-theoretic perspective. The framework is designed to systematically assess the role of explicit structural modeling in enhancing language model (LM) representations and to investigate the potential of MLPs as efficient and scalable alternatives to GNNs. We extend traditional probing classifiers by incorporating a control module that allows for selective use of either the full GNN model or its decoupled components, specifically, the message-passing and feature-transformation operations.This modular approach isolates and assesses the individual contributions of these operations, avoiding confounding effects from the complete GNN architecture. Using the Edge Probing Suite, a diagnostic tool for evaluating the linguistic knowledge encoded in LMs, we find that MLPs, when used as feature-transformation modules, consistently improve the linguistic knowledge captured in LM representations across different architectures. They effectively encode both syntactic and semantic patterns. Similarly, GNNs that incorporate feature-transformation operations show beneficial effects. In contrast, models that rely solely on message-passing operations tend to underperform, often leading to negative impacts on probing task performance.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Representation Consistency for Accurate and Coherent LLM Answer Aggregation
Authors:
Junqi Jiang,
Tom Bewley,
Salim I. Amoukou,
Francesco Leofante,
Antonio Rago,
Saumitra Mishra,
Francesca Toni
Abstract:
Test-time scaling improves large language models' (LLMs) performance by allocating more compute budget during inference. To achieve this, existing methods often require intricate modifications to prompting and sampling strategies. In this work, we introduce representation consistency (RC), a test-time scaling method for aggregating answers drawn from multiple candidate responses of an LLM regardle…
▽ More
Test-time scaling improves large language models' (LLMs) performance by allocating more compute budget during inference. To achieve this, existing methods often require intricate modifications to prompting and sampling strategies. In this work, we introduce representation consistency (RC), a test-time scaling method for aggregating answers drawn from multiple candidate responses of an LLM regardless of how they were generated, including variations in prompt phrasing and sampling strategy. RC enhances answer aggregation by not only considering the number of occurrences of each answer in the candidate response set, but also the consistency of the model's internal activations while generating the set of responses leading to each answer. These activations can be either dense (raw model activations) or sparse (encoded via pretrained sparse autoencoders). Our rationale is that if the model's representations of multiple responses converging on the same answer are highly variable, this answer is more likely to be the result of incoherent reasoning and should be down-weighted during aggregation. Importantly, our method only uses cached activations and lightweight similarity computations and requires no additional model queries. Through experiments with four open-source LLMs and four reasoning datasets, we validate the effectiveness of RC for improving task performance during inference, with consistent accuracy improvements (up to 4%) over strong test-time scaling baselines. We also show that consistency in the sparse activation signals aligns well with the common notion of coherent reasoning.
△ Less
Submitted 18 June, 2025;
originally announced June 2025.
-
Team QUST at SemEval-2025 Task 10: Evaluating Large Language Models in Multiclass Multi-label Classification of News Entity Framing
Authors:
Jiyan Liu,
Youzheng Liu,
Taihang Wang,
Xiaoman Xu,
Yimin Wang,
Ye Jiang
Abstract:
This paper describes the participation of QUST_NLP in the SemEval-2025 Task 7. We propose a three-stage retrieval framework specifically designed for fact-checked claim retrieval. Initially, we evaluate the performance of several retrieval models and select the one that yields the best results for candidate retrieval. Next, we employ multiple re-ranking models to enhance the candidate results, wit…
▽ More
This paper describes the participation of QUST_NLP in the SemEval-2025 Task 7. We propose a three-stage retrieval framework specifically designed for fact-checked claim retrieval. Initially, we evaluate the performance of several retrieval models and select the one that yields the best results for candidate retrieval. Next, we employ multiple re-ranking models to enhance the candidate results, with each model selecting the Top-10 outcomes. In the final stage, we utilize weighted voting to determine the final retrieval outcomes. Our approach achieved 5th place in the monolingual track and 7th place in the crosslingual track. We release our system code at: https://github.com/warmth27/SemEval2025_Task7.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
WorldVLA: Towards Autoregressive Action World Model
Authors:
Jun Cen,
Chaohui Yu,
Hangjie Yuan,
Yuming Jiang,
Siteng Huang,
Jiayan Guo,
Xin Li,
Yibing Song,
Hao Luo,
Fan Wang,
Deli Zhao,
Hao Chen
Abstract:
We present WorldVLA, an autoregressive action world model that unifies action and image understanding and generation. Our WorldVLA intergrates Vision-Language-Action (VLA) model and world model in one single framework. The world model predicts future images by leveraging both action and image understanding, with the purpose of learning the underlying physics of the environment to improve action ge…
▽ More
We present WorldVLA, an autoregressive action world model that unifies action and image understanding and generation. Our WorldVLA intergrates Vision-Language-Action (VLA) model and world model in one single framework. The world model predicts future images by leveraging both action and image understanding, with the purpose of learning the underlying physics of the environment to improve action generation. Meanwhile, the action model generates the subsequent actions based on image observations, aiding in visual understanding and in turn helps visual generation of the world model. We demonstrate that WorldVLA outperforms standalone action and world models, highlighting the mutual enhancement between the world model and the action model. In addition, we find that the performance of the action model deteriorates when generating sequences of actions in an autoregressive manner. This phenomenon can be attributed to the model's limited generalization capability for action prediction, leading to the propagation of errors from earlier actions to subsequent ones. To address this issue, we propose an attention mask strategy that selectively masks prior actions during the generation of the current action, which shows significant performance improvement in the action chunk generation task.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
DynamicBench: Evaluating Real-Time Report Generation in Large Language Models
Authors:
Jingyao Li,
Hao Sun,
Zile Qiao,
Yong Jiang,
Pengjun Xie,
Fei Huang,
Hong Xu,
Jiaya Jia
Abstract:
Traditional benchmarks for large language models (LLMs) typically rely on static evaluations through storytelling or opinion expression, which fail to capture the dynamic requirements of real-time information processing in contemporary applications. To address this limitation, we present DynamicBench, a benchmark designed to evaluate the proficiency of LLMs in storing and processing up-to-the-minu…
▽ More
Traditional benchmarks for large language models (LLMs) typically rely on static evaluations through storytelling or opinion expression, which fail to capture the dynamic requirements of real-time information processing in contemporary applications. To address this limitation, we present DynamicBench, a benchmark designed to evaluate the proficiency of LLMs in storing and processing up-to-the-minute data. DynamicBench utilizes a dual-path retrieval pipeline, integrating web searches with local report databases. It necessitates domain-specific knowledge, ensuring accurate responses report generation within specialized fields. By evaluating models in scenarios that either provide or withhold external documents, DynamicBench effectively measures their capability to independently process recent information or leverage contextual enhancements. Additionally, we introduce an advanced report generation system adept at managing dynamic information synthesis. Our experimental results confirm the efficacy of our approach, with our method achieving state-of-the-art performance, surpassing GPT4o in document-free and document-assisted scenarios by 7.0% and 5.8%, respectively. The code and data will be made publicly available.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Integrating Vehicle Acoustic Data for Enhanced Urban Traffic Management: A Study on Speed Classification in Suzhou
Authors:
Pengfei Fan,
Yuli Zhang,
Xinheng Wang,
Ruiyuan Jiang,
Hankang Gu,
Dongyao Jia,
Shangbo Wang
Abstract:
This study presents and publicly releases the Suzhou Urban Road Acoustic Dataset (SZUR-Acoustic Dataset), which is accompanied by comprehensive data-acquisition protocols and annotation guidelines to ensure transparency and reproducibility of the experimental workflow. To model the coupling between vehicular noise and driving speed, we propose a bimodal-feature-fusion deep convolutional neural net…
▽ More
This study presents and publicly releases the Suzhou Urban Road Acoustic Dataset (SZUR-Acoustic Dataset), which is accompanied by comprehensive data-acquisition protocols and annotation guidelines to ensure transparency and reproducibility of the experimental workflow. To model the coupling between vehicular noise and driving speed, we propose a bimodal-feature-fusion deep convolutional neural network (BMCNN). During preprocessing, an adaptive denoising and normalization strategy is applied to suppress environmental background interference; in the network architecture, parallel branches extract Mel-frequency cepstral coefficients (MFCCs) and wavelet-packet energy features, which are subsequently fused via a cross-modal attention mechanism in the intermediate feature space to fully exploit time-frequency information. Experimental results demonstrate that BMCNN achieves a classification accuracy of 87.56% on the SZUR-Acoustic Dataset and 96.28% on the public IDMT-Traffic dataset. Ablation studies and robustness tests on the Suzhou dataset further validate the contributions of each module to performance improvement and overfitting mitigation. The proposed acoustics-based speed classification method can be integrated into smart-city traffic management systems for real-time noise monitoring and speed estimation, thereby optimizing traffic flow control, reducing roadside noise pollution, and supporting sustainable urban planning.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Optimizing Gaussian Process Kernels Using Nested Sampling and ABC Rejection for H(z) Reconstruction
Authors:
Jia-yan Jiang,
Kang Jiao,
Tong-Jie Zhang
Abstract:
Recent cosmological observations have achieved high-precision measurements of the Universe's expansion history, prompting the use of nonparametric methods such as Gaussian processes (GP) regression. We apply GP regression for reconstructing the Hubble parameter using CC data, with improved covariance modeling and latest study in CC data. By comparing reconstructions in redshift space $z$ and trans…
▽ More
Recent cosmological observations have achieved high-precision measurements of the Universe's expansion history, prompting the use of nonparametric methods such as Gaussian processes (GP) regression. We apply GP regression for reconstructing the Hubble parameter using CC data, with improved covariance modeling and latest study in CC data. By comparing reconstructions in redshift space $z$ and transformed space $\log(z+1)$ , we evaluate six kernel functions using nested sampling (NS) and approximate Bayesian computation rejection (ABC rejection) methods and analyze the construction of Hubble constant $H_0$ in different models. Our analysis demonstrates that reconstructions in $\log(z+1)$ space remain physically reasonable, offering a viable alternative to conventional $z$ space approaches, while the introduction of nondiagonal covariance matrices leads to degraded reconstruction quality, suggesting that simplified diagonal forms may be preferable for reconstruction. These findings underscore the importance of task-specific kernel selection in GP-based cosmological inference. In particular, our findings suggest that careful preliminary screening of kernel functions, based on the physical quantities of interest, is essential for reliable inference in cosmological research using GP.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
JUNO 20-inch PMT and electronics system characterization using large pulses of PMT dark counts at the Pan-Asia testing platform
Authors:
Caimei Liu,
Min Li,
Narongkiat Rodphai,
Zhimin Wang,
Jun Hu,
Nikolay Anfimov,
Lei Fan,
Alberto Garfagnini,
Guanghua Gong,
Shaojing Hou,
Xiaolu Ji,
Xiaoshan Jiang,
Denis Korablev,
Tobias Lachenmaier,
Si Ma,
Xiaoyan Ma,
Zhe Ning,
Alexander G. Olshevskiy,
Zhaoyuan Peng,
Zhonghua Qin,
Tobias Sterr,
Yunhua Sun,
Alexander Felix Tietzsch,
Jun Wang,
Wei Wang
, et al. (13 additional authors not shown)
Abstract:
The main goal of the JUNO experiment is to determine the neutrino mass ordering with a 20kt liquid-scintillator detector. The 20-inch PMT and its 1F3 (one for three) electronics are crucial to realize the excellent energy resolution of at least 3% at 1MeV. The knowledge on the PMT and 1F3 electronics response is critical for detector performance understanding. A study of the JUNO 20-inch PMT and 1…
▽ More
The main goal of the JUNO experiment is to determine the neutrino mass ordering with a 20kt liquid-scintillator detector. The 20-inch PMT and its 1F3 (one for three) electronics are crucial to realize the excellent energy resolution of at least 3% at 1MeV. The knowledge on the PMT and 1F3 electronics response is critical for detector performance understanding. A study of the JUNO 20-inch PMT and 1F3 electronics system characterization is presented using large pulses of PMT dark count at the Pan-Asia testing platform in China. Thanks to its broad amplitude range and high rate, the large pulse signals are also used to investigate the PMT after pulse response.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Alternating Spintronics: Capacitive Behavior of Spin Valves and Resonator Applications
Authors:
Yunwen Liu,
Jiang Xiao
Abstract:
This study explores the time-dependent spin transport phenomena in magnetic heterostructures under alternating currents (AC), advancing the relatively underdeveloped field of alternating spintronics. Employing a time-dependent spin diffusion model, we show that the interplay of AC frequencies and spin relaxation times reveals significant differences in spin accumulation patterns compared to conven…
▽ More
This study explores the time-dependent spin transport phenomena in magnetic heterostructures under alternating currents (AC), advancing the relatively underdeveloped field of alternating spintronics. Employing a time-dependent spin diffusion model, we show that the interplay of AC frequencies and spin relaxation times reveals significant differences in spin accumulation patterns compared to conventional direct current (DC) scenarios. Of particular interest is the emergence of capacitive-like impedance in a spin valve under AC conditions, which is especially pronounced in antiparallel spin configurations. These findings open up possibilities for developing high-frequency spintronic devices, including the proposed "spin resonator", which functions like a standard LC resonator but without a traditional capacitor.
△ Less
Submitted 27 June, 2025; v1 submitted 26 June, 2025;
originally announced June 2025.
-
Complexity of PXP scars revisited
Authors:
Pawel Caputa,
Xuhao Jiang,
Sinong Liu
Abstract:
We revisit a quantum quench scenario in which either a scarring or thermalizing initial state evolves under the PXP Hamiltonian. Within this framework, we study the time evolution of spread complexity and related quantities in the Krylov basis. We find that the Lanczos coefficients $b_n$, as functions of the iteration number $n$, exhibit a characteristic arched growth and decay, followed by errati…
▽ More
We revisit a quantum quench scenario in which either a scarring or thermalizing initial state evolves under the PXP Hamiltonian. Within this framework, we study the time evolution of spread complexity and related quantities in the Krylov basis. We find that the Lanczos coefficients $b_n$, as functions of the iteration number $n$, exhibit a characteristic arched growth and decay, followed by erratic oscillations which we refer to as buttress. The arched profile predominantly arises from contributions within the quantum many-body scar subspace, while the buttress is linked to thermalization dynamics. To explain this behavior, we utilize the representation theory of $\mathfrak{s}l_3(\mathbb{C})$, allowing us to decompose the PXP Hamiltonian into a linear component and a residual part. The linear term governs the formation and width of the arch, and we observe that that there exists a threshold of arch width which determines whether a given initial state exhibits scarring. Meanwhile, the residual term accounts qualitatively for the emergence of the buttress. We estimate an upper bound for the extent of the buttress using Lucas numbers. Finally, we demonstrate that spread complexity oscillates periodically over time for scarred initial states, whereas such oscillations are suppressed in thermalizing cases.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Learning to See in the Extremely Dark
Authors:
Hai Jiang,
Binhao Guan,
Zhen Liu,
Xiaohong Liu,
Jian Yu,
Zheng Liu,
Songchen Han,
Shuaicheng Liu
Abstract:
Learning-based methods have made promising advances in low-light RAW image enhancement, while their capability to extremely dark scenes where the environmental illuminance drops as low as 0.0001 lux remains to be explored due to the lack of corresponding datasets. To this end, we propose a paired-to-paired data synthesis pipeline capable of generating well-calibrated extremely low-light RAW images…
▽ More
Learning-based methods have made promising advances in low-light RAW image enhancement, while their capability to extremely dark scenes where the environmental illuminance drops as low as 0.0001 lux remains to be explored due to the lack of corresponding datasets. To this end, we propose a paired-to-paired data synthesis pipeline capable of generating well-calibrated extremely low-light RAW images at three precise illuminance ranges of 0.01-0.1 lux, 0.001-0.01 lux, and 0.0001-0.001 lux, together with high-quality sRGB references to comprise a large-scale paired dataset named See-in-the-Extremely-Dark (SIED) to benchmark low-light RAW image enhancement approaches. Furthermore, we propose a diffusion-based framework that leverages the generative ability and intrinsic denoising property of diffusion models to restore visually pleasing results from extremely low-SNR RAW inputs, in which an Adaptive Illumination Correction Module (AICM) and a color consistency loss are introduced to ensure accurate exposure correction and color restoration. Extensive experiments on the proposed SIED and publicly available benchmarks demonstrate the effectiveness of our method. The code and dataset are available at https://github.com/JianghaiSCU/SIED.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
TEMPEST-LoRa: Cross-Technology Covert Communication
Authors:
Xieyang Sun,
Yuanqing Zheng,
Wei Xi,
Zuhao Chen,
Zhizhen Chen,
Han Hao,
Zhiping Jiang,
Sheng Zhong
Abstract:
Electromagnetic (EM) covert channels pose significant threats to computer and communications security in air-gapped networks. Previous works exploit EM radiation from various components (e.g., video cables, memory buses, CPUs) to secretly send sensitive information. These approaches typically require the attacker to deploy highly specialized receivers near the victim, which limits their real-world…
▽ More
Electromagnetic (EM) covert channels pose significant threats to computer and communications security in air-gapped networks. Previous works exploit EM radiation from various components (e.g., video cables, memory buses, CPUs) to secretly send sensitive information. These approaches typically require the attacker to deploy highly specialized receivers near the victim, which limits their real-world impact. This paper reports a new EM covert channel, TEMPEST-LoRa, that builds on Cross-Technology Covert Communication (CTCC), which could allow attackers to covertly transmit EM-modulated secret data from air-gapped networks to widely deployed operational LoRa receivers from afar. We reveal the potential risk and demonstrate the feasibility of CTCC by tackling practical challenges involved in manipulating video cables to precisely generate the EM leakage that could readily be received by third-party commercial LoRa nodes/gateways. Experiment results show that attackers can reliably decode secret data modulated by the EM leakage from a video cable at a maximum distance of 87.5m or a rate of 21.6 kbps. We note that the secret data transmission can be performed with monitors turned off (therefore covertly).
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
One-way network nonlocality of continuous variable entangled networks
Authors:
Jun-Li Jiang,
Xin-Zhu Liu,
Xue Yang,
Xiuyong Ding,
Da Zhang,
Ming-Xing Luo
Abstract:
Nonlocality is a key feature of quantum networks and is being studied for its potential applications in quantum communication and computing. Understanding and harnessing nonlocality in quantum networks could lead to the development of faster and more secure communication systems. All the nonclassicalities are limited to discrete variable quantum networks. We propose the first method to verify the…
▽ More
Nonlocality is a key feature of quantum networks and is being studied for its potential applications in quantum communication and computing. Understanding and harnessing nonlocality in quantum networks could lead to the development of faster and more secure communication systems. All the nonclassicalities are limited to discrete variable quantum networks. We propose the first method to verify the network nonlocality of all optical quantum network consisting of two entangled states where one-way classical communication is allowed. This provides the first device-independent method to verify the quantum correlations generated from all optical continuous-variable quantum networks.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Experimental Characterization of Quantumness Using the Uncertainty Principle, Coherence, and Nonlocality
Authors:
Yan-Han Yang,
Xin-Zhu Liu,
Xing-Zhou Zheng,
Jun-Li Jiang,
Xue Yang,
Shao-Ming Fei,
Zhihao Ma,
Zizhu Wang,
Ming-Xing Luo
Abstract:
Heisenberg's uncertainty principle, coherence and Bell nonlocality have been individually examined through many experiments. In this Letter, we systematically characterize all of this quantumness in a unified manner. We first construct universal uncertainty relations to reveal intrinsic features of incompatible measurements, which include all the state-independent uncertainties as special cases. W…
▽ More
Heisenberg's uncertainty principle, coherence and Bell nonlocality have been individually examined through many experiments. In this Letter, we systematically characterize all of this quantumness in a unified manner. We first construct universal uncertainty relations to reveal intrinsic features of incompatible measurements, which include all the state-independent uncertainties as special cases. We further extend to witness both quantum coherence and Bell nonlocality. We finally perform experiments with unified two-photon states, and validate the uncertainty principle, coherence and Bell nonlocality within the experimental error. Our methods for witnessing quantumness are valuable in characterizing quantum correlations in quantum information processing.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
A Semi-supervised Scalable Unified Framework for E-commerce Query Classification
Authors:
Chunyuan Yuan,
Chong Zhang,
Zheng Fang,
Ming Pang,
Xue Jiang,
Changping Peng,
Zhangang Lin,
Ching Law
Abstract:
Query classification, including multiple subtasks such as intent and category prediction, is vital to e-commerce applications. E-commerce queries are usually short and lack context, and the information between labels cannot be used, resulting in insufficient prior information for modeling. Most existing industrial query classification methods rely on users' posterior click behavior to construct tr…
▽ More
Query classification, including multiple subtasks such as intent and category prediction, is vital to e-commerce applications. E-commerce queries are usually short and lack context, and the information between labels cannot be used, resulting in insufficient prior information for modeling. Most existing industrial query classification methods rely on users' posterior click behavior to construct training samples, resulting in a Matthew vicious cycle. Furthermore, the subtasks of query classification lack a unified framework, leading to low efficiency for algorithm optimization.
In this paper, we propose a novel Semi-supervised Scalable Unified Framework (SSUF), containing multiple enhanced modules to unify the query classification tasks. The knowledge-enhanced module uses world knowledge to enhance query representations and solve the problem of insufficient query information. The label-enhanced module uses label semantics and semi-supervised signals to reduce the dependence on posterior labels. The structure-enhanced module enhances the label representation based on the complex label relations. Each module is highly pluggable, and input features can be added or removed as needed according to each subtask. We conduct extensive offline and online A/B experiments, and the results show that SSUF significantly outperforms the state-of-the-art models.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Probing valence electron and hydrogen dynamics using charge-pair imaging with ultrafast electron diffraction
Authors:
Tianyu Wang,
Hui Jiang,
Ming Zhang,
Xiao Zou,
Pengfei Zhu,
Feng He,
Zheng Li,
Dao Xiang
Abstract:
A key challenge in ultrafast science has been to directly track the coupled motions of electrons and nuclei in real-space and real-time. This study presents a significant step towards this goal by demonstrating the feasibility of time-resolved real-space tracking of valence electron and hydrogen dynamics during the photodissociation of ammonia (NH3) using MeV ultrafast electron diffraction. It is…
▽ More
A key challenge in ultrafast science has been to directly track the coupled motions of electrons and nuclei in real-space and real-time. This study presents a significant step towards this goal by demonstrating the feasibility of time-resolved real-space tracking of valence electron and hydrogen dynamics during the photodissociation of ammonia (NH3) using MeV ultrafast electron diffraction. It is demonstrated that the enhanced temporal resolution, in conjunction with the analysis of the charge-pair distribution function, enables the disentanglement of the correlated motion of valence electrons and hydrogens in photoexcited ammonia molecule. The methodology employed in this study, which utilizes the charge-pair distribution function from ultrafast electron scattering to retrieve intertwined electron and nucleus dynamics, may open up new opportunities in the study of quantum dynamics for a wide range of molecules.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
V2X-REALM: Vision-Language Model-Based Robust End-to-End Cooperative Autonomous Driving with Adaptive Long-Tail Modeling
Authors:
Junwei You,
Pei Li,
Zhuoyu Jiang,
Zilin Huang,
Rui Gan,
Haotian Shi,
Bin Ran
Abstract:
Ensuring robust planning and decision-making under rare, diverse, and visually degraded long-tail scenarios remains a fundamental challenge for autonomous driving in urban environments. This issue becomes more critical in cooperative settings, where vehicles and infrastructure jointly perceive and reason across complex environments. To address this challenge, we propose V2X-REALM, a vision-languag…
▽ More
Ensuring robust planning and decision-making under rare, diverse, and visually degraded long-tail scenarios remains a fundamental challenge for autonomous driving in urban environments. This issue becomes more critical in cooperative settings, where vehicles and infrastructure jointly perceive and reason across complex environments. To address this challenge, we propose V2X-REALM, a vision-language model (VLM)-based framework with adaptive multimodal learning for robust cooperative autonomous driving under long-tail scenarios. V2X-REALM introduces three core innovations: (i) a prompt-driven long-tail scenario generation and evaluation pipeline that leverages foundation models to synthesize realistic long-tail conditions such as snow and fog across vehicle- and infrastructure-side views, enriching training diversity efficiently; (ii) a gated multi-scenario adaptive attention module that modulates the visual stream using scenario priors to recalibrate ambiguous or corrupted features; and (iii) a multi-task scenario-aware contrastive learning objective that improves multimodal alignment and promotes cross-scenario feature separability. Extensive experiments demonstrate that V2X-REALM significantly outperforms existing baselines in robustness, semantic reasoning, safety, and planning accuracy under complex, challenging driving conditions, advancing the scalability of end-to-end cooperative autonomous driving.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Ferroelectricity in 6 Angstrom-Thick Two-dimensional Ga$_2$O$_3$
Authors:
Tong Jiang,
Han Chen,
Yubo Yuan,
Xiang Xu,
Junwei Cao,
Hao Wang,
Xuechun Sun,
Junshuai Li,
Yaqing Ma,
Huaze Zhu,
Wenbin Li,
Wei Kong
Abstract:
Atomic-scale ferroelectric thin films hold great promise for high-density, low-power applications but face stability and voltage scaling challenges at extreme thinness. Here, we demonstrate ferroelectricity in single-crystalline two-dimensional (2D) Ga$_2$O$_3$, an ultra-wide-bandgap semiconductor, at just 6 angstrom thickness, exhibiting exceptional retention and thermal stability. We show that e…
▽ More
Atomic-scale ferroelectric thin films hold great promise for high-density, low-power applications but face stability and voltage scaling challenges at extreme thinness. Here, we demonstrate ferroelectricity in single-crystalline two-dimensional (2D) Ga$_2$O$_3$, an ultra-wide-bandgap semiconductor, at just 6 angstrom thickness, exhibiting exceptional retention and thermal stability. We show that epitaxial beta-Ga$_2$O$_3$ can be exfoliated down to a half-unit cell thickness via a self-limiting mechanism, enabling a biaxial strain-induced phase transition into a novel ferroelectric layered structure. Strain modulation enables the reduction of polarization switching voltage to 0.8 V, meeting CMOS voltage scaling requirements. Theoretical calculations reveal that switching is driven by covalent bond reconstruction, effectively countering depolarization and enhancing stability. Additionally, we integrate ferroelectric 2D Ga$_2$O$_3$ onto silicon using a low-temperature, back-end-of-line-compatible process. This work advances the exploration of sub-nanometer ferroelectrics, paving the way for high-density, low-power, non-volatile applications seamlessly integrated with advanced silicon technology.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
TRIDENT: Tri-Modal Molecular Representation Learning with Taxonomic Annotations and Local Correspondence
Authors:
Feng Jiang,
Mangal Prakash,
Hehuan Ma,
Jianyuan Deng,
Yuzhi Guo,
Amina Mollaysa,
Tommaso Mansi,
Rui Liao,
Junzhou Huang
Abstract:
Molecular property prediction aims to learn representations that map chemical structures to functional properties. While multimodal learning has emerged as a powerful paradigm to learn molecular representations, prior works have largely overlooked textual and taxonomic information of molecules for representation learning. We introduce TRIDENT, a novel framework that integrates molecular SMILES, te…
▽ More
Molecular property prediction aims to learn representations that map chemical structures to functional properties. While multimodal learning has emerged as a powerful paradigm to learn molecular representations, prior works have largely overlooked textual and taxonomic information of molecules for representation learning. We introduce TRIDENT, a novel framework that integrates molecular SMILES, textual descriptions, and taxonomic functional annotations to learn rich molecular representations. To achieve this, we curate a comprehensive dataset of molecule-text pairs with structured, multi-level functional annotations. Instead of relying on conventional contrastive loss, TRIDENT employs a volume-based alignment objective to jointly align tri-modal features at the global level, enabling soft, geometry-aware alignment across modalities. Additionally, TRIDENT introduces a novel local alignment objective that captures detailed relationships between molecular substructures and their corresponding sub-textual descriptions. A momentum-based mechanism dynamically balances global and local alignment, enabling the model to learn both broad functional semantics and fine-grained structure-function mappings. TRIDENT achieves state-of-the-art performance on 11 downstream tasks, demonstrating the value of combining SMILES, textual, and taxonomic functional annotations for molecular property prediction.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.