-
Some Consistent Power Constructions
Authors:
Chengyu Zhou,
Qingguo Li
Abstract:
Consistent Hoare, Smyth and Plotkin power domains are introduced and discussed by Yuan and Kou. The consistent algebraic operation $+$ defined by them is a binary partial Scott continuous operation satisfying the requirement: $a+b$ exists whenever there exists a $c$ which is greater than $a$ and $b$. We extend the consistency to be a categorical concept and obtain an approach to generating consist…
▽ More
Consistent Hoare, Smyth and Plotkin power domains are introduced and discussed by Yuan and Kou. The consistent algebraic operation $+$ defined by them is a binary partial Scott continuous operation satisfying the requirement: $a+b$ exists whenever there exists a $c$ which is greater than $a$ and $b$. We extend the consistency to be a categorical concept and obtain an approach to generating consistent monads from monads on dcpos whose images equipped with some algebraic operations. Then we provide two new power constructions over domains: the consistent Plotkin index power domain and the consistent probabilistic power domain. Moreover, we verify these power constructions are free.
△ Less
Submitted 7 March, 2025;
originally announced March 2025.
-
Mixed Near-field and Far-field Target Localization for Low-altitude Economy
Authors:
Cong Zhou,
Changsheng You,
Chao Zhou,
Hongqiang Cheng,
Shuo Shi
Abstract:
In this paper, we study efficient mixed near-field and far-field target localization methods for low-altitude economy, by capitalizing on extremely large-scale multiple-input multiple-output (XL-MIMO) communication systems. Compared with existing works, we address three new challenges in localization, arising from 1) half-wavelength antenna spacing constraint, 2) hybrid uniform planar array (UPA)…
▽ More
In this paper, we study efficient mixed near-field and far-field target localization methods for low-altitude economy, by capitalizing on extremely large-scale multiple-input multiple-output (XL-MIMO) communication systems. Compared with existing works, we address three new challenges in localization, arising from 1) half-wavelength antenna spacing constraint, 2) hybrid uniform planar array (UPA) architecture, and 3) incorrect mixed-field target classification for near-field targets.To address these issues, we propose a new three-step mixed-field localization method.First, we reconstruct the signals received at UPA antennas by judiciously designing analog combining matrices over time with minimum recovery errors, thus tackling the reduced-dimensional signal-space issue in hybrid arrays.Second, based on recovered signals, we devise a modified MUSIC algorithm (catered to UPA architecture) to estimate 2D angular parameters of both far- and near-field targets. Due to half-wavelength inter-antenna spacing, there exist ambiguous angles when estimating true angles of targets.In the third step, we design an effective classification method to distinguish mixed-field targets, determine true angles of all targets, as well as estimate the ranges of near-field targets. In particular, angular ambiguity is resolved by showing an important fact that the three types of estimated angles (i.e., far-field, near-field, and ambiguous angles) exhibit significantly different patterns in the range-domain MUSIC spectrum. Furthermore, to characterize the estimation error lower-bound, we obtain a matrix closed-form Cramér-Rao bounds for mixed-field target localization. Finally, numerical results demonstrate the effectiveness of our proposed mixed-field localization method, which improves target-classification accuracy and achieves a lower root mean square error than various benchmark schemes.
△ Less
Submitted 6 March, 2025;
originally announced March 2025.
-
Unveiling the Oxidation Mechanisms of Octa-Penta Graphene: A Multidimensional Exploration from First-Principles to Machine Learning
Authors:
Chenyi Zhou,
Rubin Huo,
Boyi Situ,
Zihan Yan,
Zhe Zhang,
Yusong Tu
Abstract:
Octa-penta graphene (OPG), a novel carbon allotrope characterized by its distinctive arrangement of pentagonal and octagonal rings, has garnered considerable attention due to its exceptional structure and functional properties. This study systematically investigates the oxidation mechanisms of OPG and elucidates the oxygen migration patterns on the OPG monolayer through first-principles calculatio…
▽ More
Octa-penta graphene (OPG), a novel carbon allotrope characterized by its distinctive arrangement of pentagonal and octagonal rings, has garnered considerable attention due to its exceptional structure and functional properties. This study systematically investigates the oxidation mechanisms of OPG and elucidates the oxygen migration patterns on the OPG monolayer through first-principles calculations and machine-learning-based molecular dynamics (MLMD) simulations. Specifically, the oxidation processes on OPG-L and OPG-Z involve exothermic chemisorption, where oxygen molecules dissociate at the surfaces, forming stable epoxy groups. Furthermore, the integrated-crystal orbital Hamilton population (ICOHP) and Bader charge analyses provide insights into the physical mechanisms of oxygen atom adsorption. Importantly, we found that oxidation also impact the electronic properties of OPG, with OPG-L retaining its metallic characteristics post-oxygen adsorption, whereas OPG-Z undergoes a transformation from a metallic to a semiconducting state due to the introduction of oxygen. Oxygen migration on OPG monolayer involves breaking and reforming of C-O bonds, with varying stability across adsorption sites and limited migration along the basal plane. MLMD simulations corroborate these migration patterns, offering detailed migration trajectories consistent with theoretical predictions. These findings enhance the understanding of oxygen migration dynamics on OPG, facilitate its experimental validations, and highlight its potential as a novel 2D material for applications in batteries, heat-resistant materials, and oxidation-resistant coatings.
△ Less
Submitted 5 March, 2025;
originally announced March 2025.
-
Learning to Reduce Search Space for Generalizable Neural Routing Solver
Authors:
Changliang Zhou,
Xi Lin,
Zhenkun Wang,
Qingfu Zhang
Abstract:
Constructive neural combinatorial optimization (NCO) has attracted growing research attention due to its ability to solve complex routing problems without relying on handcrafted rules. However, existing NCO methods face significant challenges in generalizing to large-scale problems due to high computational complexity and inefficient capture of structural patterns. To address this issue, we propos…
▽ More
Constructive neural combinatorial optimization (NCO) has attracted growing research attention due to its ability to solve complex routing problems without relying on handcrafted rules. However, existing NCO methods face significant challenges in generalizing to large-scale problems due to high computational complexity and inefficient capture of structural patterns. To address this issue, we propose a novel learning-based search space reduction method that adaptively selects a small set of promising candidate nodes at each step of the constructive NCO process. Unlike traditional methods that rely on fixed heuristics, our selection model dynamically prioritizes nodes based on learned patterns, significantly reducing the search space while maintaining solution quality. Experimental results demonstrate that our method, trained solely on 100-node instances from uniform distribution, generalizes remarkably well to large-scale Traveling Salesman Problem (TSP) and Capacitated Vehicle Routing Problem (CVRP) instances with up to 1 million nodes from the uniform distribution and over 80K nodes from other distributions.
△ Less
Submitted 19 May, 2025; v1 submitted 4 March, 2025;
originally announced March 2025.
-
Unsupervised Waste Classification By Dual-Encoder Contrastive Learning and Multi-Clustering Voting (DECMCV)
Authors:
Kui Huang,
Mengke Song,
Shuo Ba,
Ling An,
Huajie Liang,
Huanxi Deng,
Yang Liu,
Zhenyu Zhang,
Chichun Zhou
Abstract:
Waste classification is crucial for improving processing efficiency and reducing environmental pollution. Supervised deep learning methods are commonly used for automated waste classification, but they rely heavily on large labeled datasets, which are costly and inefficient to obtain. Real-world waste data often exhibit category and style biases, such as variations in camera angles, lighting condi…
▽ More
Waste classification is crucial for improving processing efficiency and reducing environmental pollution. Supervised deep learning methods are commonly used for automated waste classification, but they rely heavily on large labeled datasets, which are costly and inefficient to obtain. Real-world waste data often exhibit category and style biases, such as variations in camera angles, lighting conditions, and types of waste, which can impact the model's performance and generalization ability. Therefore, constructing a bias-free dataset is essential. Manual labeling is not only costly but also inefficient. While self-supervised learning helps address data scarcity, it still depends on some labeled data and generally results in lower accuracy compared to supervised methods. Unsupervised methods show potential in certain cases but typically do not perform as well as supervised models, highlighting the need for an efficient and cost-effective unsupervised approach. This study presents a novel unsupervised method, Dual-Encoder Contrastive Learning with Multi-Clustering Voting (DECMCV). The approach involves using a pre-trained ConvNeXt model for image encoding, leveraging VisionTransformer to generate positive samples, and applying a multi-clustering voting mechanism to address data labeling and domain shift issues. Experimental results demonstrate that DECMCV achieves classification accuracies of 93.78% and 98.29% on the TrashNet and Huawei Cloud datasets, respectively, outperforming or matching supervised models. On a real-world dataset of 4,169 waste images, only 50 labeled samples were needed to accurately label thousands, improving classification accuracy by 29.85% compared to supervised models. This method effectively addresses style differences, enhances model generalization, and contributes to the advancement of automated waste classification.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
First Measurement of the Decay Dynamics in the Semileptonic Transition of the $D^{+(0)}$ into the Axial-vector Meson $\bar K_1(1270)$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (680 additional authors not shown)
Abstract:
Using $e^+e^-$ collision data taken at the center-of-mass energy of 3.773 GeV with the BESIII detector, corresponding to an integrated luminosity of 20.3 fb$^{-1}$, we report the first amplitude and angular analyses of the semileptonic decays $D^{+(0)}\to K^-π^+π^{0(-)} e^+ν_e$. From the amplitude analysis, we determine for the first time the hadronic form factors of the semileptonic $D$ decays in…
▽ More
Using $e^+e^-$ collision data taken at the center-of-mass energy of 3.773 GeV with the BESIII detector, corresponding to an integrated luminosity of 20.3 fb$^{-1}$, we report the first amplitude and angular analyses of the semileptonic decays $D^{+(0)}\to K^-π^+π^{0(-)} e^+ν_e$. From the amplitude analysis, we determine for the first time the hadronic form factors of the semileptonic $D$ decays into the axial-vector meson $\bar{K}_1(1270)$ to be $r_A=(-11.2\pm1.0\pm0.9)\times10^{-2}$ and $r_V = (-4.3\pm 1.0\pm2.4)\times 10^{-2}$. The angular analysis yields an up-down asymmetry $\mathcal{A}^\prime_{ud} = 0.01\pm0.11$, which is consistent with the Standard Model prediction.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
NM-SpMM: Accelerating Matrix Multiplication Using N:M Sparsity with GPGPU
Authors:
Cong Ma,
Du Wu,
Zhelang Deng,
Jiang Chen,
Xiaowen Huang,
Jintao Meng,
Wenxi Zhu,
Bingqiang Wang,
Amelie Chi Zhou,
Peng Chen,
Minwen Deng,
Yanjie Wei,
Shengzhong Feng,
Yi Pan
Abstract:
Deep learning demonstrates effectiveness across a wide range of tasks. However, the dense and over-parameterized nature of these models results in significant resource consumption during deployment. In response to this issue, weight pruning, particularly through N:M sparsity matrix multiplication, offers an efficient solution by transforming dense operations into semi-sparse ones. N:M sparsity pro…
▽ More
Deep learning demonstrates effectiveness across a wide range of tasks. However, the dense and over-parameterized nature of these models results in significant resource consumption during deployment. In response to this issue, weight pruning, particularly through N:M sparsity matrix multiplication, offers an efficient solution by transforming dense operations into semi-sparse ones. N:M sparsity provides an option for balancing performance and model accuracy, but introduces more complex programming and optimization challenges. To address these issues, we design a systematic top-down performance analysis model for N:M sparsity. Meanwhile, NM-SpMM is proposed as an efficient general N:M sparsity implementation. Based on our performance analysis, NM-SpMM employs a hierarchical blocking mechanism as a general optimization to enhance data locality, while memory access optimization and pipeline design are introduced as sparsity-aware optimization, allowing it to achieve close-to-theoretical peak performance across different sparsity levels. Experimental results show that NM-SpMM is 2.1x faster than nmSPARSE (the state-of-the-art for general N:M sparsity) and 1.4x to 6.3x faster than cuBLAS's dense GEMM operations, closely approaching the theoretical maximum speedup resulting from the reduction in computation due to sparsity. NM-SpMM is open source and publicly available at https://github.com/M-H482/NM-SpMM.
△ Less
Submitted 4 March, 2025; v1 submitted 3 March, 2025;
originally announced March 2025.
-
A sharp-interface approach for simulating solid-state dewetting of thin films with double-bubble structure
Authors:
Meng Li,
Nan Wang,
Ruofan Zhao,
Chunjie Zhou
Abstract:
We develop a sharp-interface model for solid-state dewetting of double-bubble thin films using an energy variational approach based on a newly proposed interfacial energy. This model characterizes the dynamic evolution of interfaces in double-bubble thin films, a process primarily governed by surface diffusion and junction/contact points migration, and fundamentally distinct from the behavior obse…
▽ More
We develop a sharp-interface model for solid-state dewetting of double-bubble thin films using an energy variational approach based on a newly proposed interfacial energy. This model characterizes the dynamic evolution of interfaces in double-bubble thin films, a process primarily governed by surface diffusion and junction/contact points migration, and fundamentally distinct from the behavior observed in a single thin film. Subsequently, a structure-preserving parametric finite element approximation is developed for the sharp-interface model, which can preserve both area conservation and energy stability. Extensive numerical experiments are presented to demonstrate the convergence, structure-preserving properties, and superior mesh quality of the proposed method. Additionally, we investigate several specific evolution processes, including the equilibrium shapes of double-bubble thin films and the pinch-off dynamics of long islands.
△ Less
Submitted 4 March, 2025; v1 submitted 2 March, 2025;
originally announced March 2025.
-
The Feasibility Study of the GeV-Energy Muon Source Based on HIAF
Authors:
Yu Xu,
Xueheng Zhang,
Yuhong Yu,
Pei Yu,
Li Deng,
Jiajia Zhai,
Liangwen Chen,
He Zhao,
Lina Sheng,
Guodong Shen,
Ziwen Pan,
Qite Li,
Chen Zhou,
Qiang Li,
Lei Yang,
Zhiyu Sun
Abstract:
Generating a mono-energetic, high-energy muon beam using accelerator facilities can be very attractive for many purposes, for example, improving muon tomography currently limited by the low flux and wide energy spread of cosmic ray muons, and searching for muon related new physics beyond the Standard Model. One potential accelerator facility is the High Intensity Heavy-Ion Accelerator Facility (HI…
▽ More
Generating a mono-energetic, high-energy muon beam using accelerator facilities can be very attractive for many purposes, for example, improving muon tomography currently limited by the low flux and wide energy spread of cosmic ray muons, and searching for muon related new physics beyond the Standard Model. One potential accelerator facility is the High Intensity Heavy-Ion Accelerator Facility (HIAF), which is currently under construction in Huizhou City, China. Considering the projectile energy and beamline length, a high-intensity and GeV-energy muon flux could be produced and delivered by the High Energy Fragment Separator beamline of the HIAF facility. In this paper, the flux intensity and purity of muon beam based on HIAF are discussed in detail. For the $μ^+$ beam, the highest muon yield reaches $8.2 \times 10^6 ~ μ$/s with the purity of approximately $2\%$ at a momentum of 3.5 GeV/c; meanwhile, for the $μ^-$ beam, the maximum muon yield is 4.2 $\times 10^6 ~ μ$/s with the purity of around $20\%$ at a momentum of 1.5 GeV/c. The results also indicate that, for muon beams with an energy of several GeV, by applying a suitable purification strategy, we can get a muon beam with a purity of 100\% and an intensity of the order of $10^5 ~ μ$/s.
△ Less
Submitted 21 May, 2025; v1 submitted 28 February, 2025;
originally announced February 2025.
-
Improved measurement of absolute branching fraction of the inclusive decay $Λ_{c}^{+} \to K_{S}^{0} X$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (679 additional authors not shown)
Abstract:
By analyzing $4.5$ fb$^{-1}$ of $e^{+}e^{-}$ collision data accumulated with the BESIII detector at center-of-mass energies ranging from $4599.53$ MeV to $4698.82$ MeV, we report the measurement of the absolute branching fraction (BF) of the inclusive decay $Λ_{c}^{+} \to K_{S}^{0} X$ using the double-tag technique. The result is $\mathcal{B}(Λ_{c}^{+} \to K_{S}^{0} X)=(10.9\pm0.2\pm0.1)\%$, where…
▽ More
By analyzing $4.5$ fb$^{-1}$ of $e^{+}e^{-}$ collision data accumulated with the BESIII detector at center-of-mass energies ranging from $4599.53$ MeV to $4698.82$ MeV, we report the measurement of the absolute branching fraction (BF) of the inclusive decay $Λ_{c}^{+} \to K_{S}^{0} X$ using the double-tag technique. The result is $\mathcal{B}(Λ_{c}^{+} \to K_{S}^{0} X)=(10.9\pm0.2\pm0.1)\%$, where the first uncertainty is statistical and the second is systematic. This result indicates that there are still undiscovered decay channels containing $K_{S}^{0}$ in the final state with a combined BF of $(3.1\pm0.4)\%$. The BF of the inclusive decay $Λ_{c}^{+} \to \overline{K}^{0} / K^{0} X$ is calculated to be $\mathcal{B}(Λ_{c}^{+} \to \overline{K}^{0} / K^{0} X)=(21.8 \pm0.4 \pm0.2 \pm1.1)\%$, where the third uncertainty accounts for a possible difference between $\mathcal{B}(Λ_{c}^{+} \to K_{S}^{0} X)$ and $\mathcal{B}(Λ_{c}^{+} \to K_{L}^{0} X)$. The result is in agreement with the prediction of the statistical isospin model.
△ Less
Submitted 21 June, 2025; v1 submitted 28 February, 2025;
originally announced February 2025.
-
A space-resolved visible spectrometer system using compact endoscopic optics for full vertical profile measurement of impurity line emissions in superconducting EAST tokamak
Authors:
A. Hu,
Y. Cheng,
L. Zhang,
S. Morita,
J. Ma,
M. Kobayashi,
C. Zhou,
J. Chen,
Y. Cao,
F. Zhang,
W. Zhang,
Z. Li,
D. Mitnik,
S. Wang,
Y. Jie,
G. Zuo,
J. Qian,
H. Liu,
G. Xu,
J. Hu,
K. Lu,
Y. Song
Abstract:
In Experimental Advanced Superconducting Tokamak (EAST tokamak) with tungsten divertors and molybdenum first wall, lithiumization and boronization have been frequently carried out to improve the plasma performance, in particular, in long pulse discharges. A study on impurity behaviors of lithium, boron and tungsten atoms/ions in the edge plasma is then crucially important. For the purpose, a space…
▽ More
In Experimental Advanced Superconducting Tokamak (EAST tokamak) with tungsten divertors and molybdenum first wall, lithiumization and boronization have been frequently carried out to improve the plasma performance, in particular, in long pulse discharges. A study on impurity behaviors of lithium, boron and tungsten atoms/ions in the edge plasma is then crucially important. For the purpose, a space-resolved visible spectrometer system has been newly developed to observe full vertical profiles over a length of 1.7m of impurity line emissions in wavelength range of 320-800nm. For the full vertical profile measurement compact endoscopic optics is employed with an optical fiber bundle for the system, which can be inserted into a 1.5m long extension tube called 'long nose', because the distance between the diagnostic port and plasma center is considerably long. Therefore, a quartz glass window mounted from the vacuum vessel side is designed to withstand the reverse pressure. A mechanical shutter is also designed to open at a large angle of 235 degree so that the viewing angle of nearby ports is not blocked. Two sets of the fiber bundle, 60-channel linear array and 11*10 channel planar array , with a length of 30m are attached to two sets of Czerny-Turner visible spectrometers for one-dimensional (1D) vertical profile measurement of core plasma and two-dimensional (2D) spectroscopy of divertor plasma, respectively. A complementary metal oxide semiconductor (CMOS) detector with 2048*2048 pixels is used for the visible spectrometers. A preliminary result on the full vertical profile is obtained for BII line emission at 703.19nm in the 1D system
△ Less
Submitted 26 February, 2025;
originally announced February 2025.
-
Renaissance of Literate Programming in the Era of LLMs: Enhancing LLM-Based Code Generation in Large-Scale Projects
Authors:
Wuyang Zhang,
Yansong Li,
Zeyu Dong,
Yu Wu,
Yingyao Zhou,
Duolei Wang,
Songsirou Xing,
Chichun Zhou,
Da Shen
Abstract:
Large Language Models (LLMs) have helped programmers increase efficiency through code generation, comprehension, and repair. However, their application to large-scale projects remains challenging due to complex interdependencies and the extensive size of modern codebases. Although Knuth's concept of Literate Programming (LP) combines code and natural language to convey logic and intent, its potent…
▽ More
Large Language Models (LLMs) have helped programmers increase efficiency through code generation, comprehension, and repair. However, their application to large-scale projects remains challenging due to complex interdependencies and the extensive size of modern codebases. Although Knuth's concept of Literate Programming (LP) combines code and natural language to convey logic and intent, its potential for enhancing relationships in large projects has not been fully explored. In this study, we introduce the idea of Interoperable LP (ILP), which leverages literate programming principles to enhance the development of both small-scale documents and large-scale projects with LLMs. We investigate how LLMs perform under ILP-style instructions for both document-oriented tasks and entire projects. Recognizing that many researchers rely on well-structured templates to guide LLMs, we propose a concise prompt engineering method to write LP documents so LLMs can better be involved in code generation. We also examine the capacity of various LLMs to generate Scheme and Python code on the RepoBench benchmark, illustrating the advantages of our approach. Our findings indicate that ILP with LLMs can enhance LLM-based code generation in large-scale project development.
△ Less
Submitted 25 December, 2024;
originally announced February 2025.
-
Large Language Model for Lossless Image Compression with Visual Prompts
Authors:
Junhao Du,
Chuqin Zhou,
Ning Cao,
Gang Chen,
Yunuo Chen,
Zhengxue Cheng,
Li Song,
Guo Lu,
Wenjun Zhang
Abstract:
Recent advancements in deep learning have driven significant progress in lossless image compression. With the emergence of Large Language Models (LLMs), preliminary attempts have been made to leverage the extensive prior knowledge embedded in these pretrained models to enhance lossless image compression, particularly by improving the entropy model. However, a significant challenge remains in bridg…
▽ More
Recent advancements in deep learning have driven significant progress in lossless image compression. With the emergence of Large Language Models (LLMs), preliminary attempts have been made to leverage the extensive prior knowledge embedded in these pretrained models to enhance lossless image compression, particularly by improving the entropy model. However, a significant challenge remains in bridging the gap between the textual prior knowledge within LLMs and lossless image compression. To tackle this challenge and unlock the potential of LLMs, this paper introduces a novel paradigm for lossless image compression that incorporates LLMs with visual prompts. Specifically, we first generate a lossy reconstruction of the input image as visual prompts, from which we extract features to serve as visual embeddings for the LLM. The residual between the original image and the lossy reconstruction is then fed into the LLM along with these visual embeddings, enabling the LLM to function as an entropy model to predict the probability distribution of the residual. Extensive experiments on multiple benchmark datasets demonstrate our method achieves state-of-the-art compression performance, surpassing both traditional and learning-based lossless image codecs. Furthermore, our approach can be easily extended to images from other domains, such as medical and screen content images, achieving impressive performance. These results highlight the potential of LLMs for lossless image compression and may inspire further research in related directions.
△ Less
Submitted 22 February, 2025;
originally announced February 2025.
-
LitLinker: Supporting the Ideation of Interdisciplinary Contexts with Large Language Models for Teaching Literature in Elementary Schools
Authors:
Haoxiang Fan,
Changshuang Zhou,
Hao Yu,
Xueyang Wu,
Jiangyu Gu,
Zhenhui Peng
Abstract:
Teaching literature under interdisciplinary contexts (e.g., science, art) that connect reading materials has become popular in elementary schools. However, constructing such contexts is challenging as it requires teachers to explore substantial amounts of interdisciplinary content and link it to the reading materials. In this paper, we develop LitLinker via an iterative design process involving 13…
▽ More
Teaching literature under interdisciplinary contexts (e.g., science, art) that connect reading materials has become popular in elementary schools. However, constructing such contexts is challenging as it requires teachers to explore substantial amounts of interdisciplinary content and link it to the reading materials. In this paper, we develop LitLinker via an iterative design process involving 13 teachers to facilitate the ideation of interdisciplinary contexts for teaching literature. Powered by a large language model (LLM), LitLinker can recommend interdisciplinary topics and contextualize them with the literary elements (e.g., paragraphs, viewpoints) in the reading materials. A within-subjects study (N=16) shows that compared to an LLM chatbot, LitLinker can improve the integration depth of different subjects and reduce workload in this ideation task. Expert interviews (N=9) also demonstrate LitLinker's usefulness for supporting the ideation of interdisciplinary contexts for teaching literature. We conclude with concerns and design considerations for supporting interdisciplinary teaching with LLMs.
△ Less
Submitted 22 February, 2025;
originally announced February 2025.
-
Nearshore Underwater Target Detection Meets UAV-borne Hyperspectral Remote Sensing: A Novel Hybrid-level Contrastive Learning Framework and Benchmark Dataset
Authors:
Jiahao Qi,
Chuanhong Zhou,
Xingyue Liu,
Chen Chen,
Dehui Zhu,
Kangcheng Bin,
Ping Zhong
Abstract:
UAV-borne hyperspectral remote sensing has emerged as a promising approach for underwater target detection (UTD). However, its effectiveness is hindered by spectral distortions in nearshore environments, which compromise the accuracy of traditional hyperspectral UTD (HUTD) methods that rely on bathymetric model. These distortions lead to significant uncertainty in target and background spectra, ch…
▽ More
UAV-borne hyperspectral remote sensing has emerged as a promising approach for underwater target detection (UTD). However, its effectiveness is hindered by spectral distortions in nearshore environments, which compromise the accuracy of traditional hyperspectral UTD (HUTD) methods that rely on bathymetric model. These distortions lead to significant uncertainty in target and background spectra, challenging the detection process. To address this, we propose the Hyperspectral Underwater Contrastive Learning Network (HUCLNet), a novel framework that integrates contrastive learning with a self-paced learning paradigm for robust HUTD in nearshore regions. HUCLNet extracts discriminative features from distorted hyperspectral data through contrastive learning, while the self-paced learning strategy selectively prioritizes the most informative samples. Additionally, a reliability-guided clustering strategy enhances the robustness of learned representations.To evaluate the method effectiveness, we conduct a novel nearshore HUTD benchmark dataset, ATR2-HUTD, covering three diverse scenarios with varying water types and turbidity, and target types. Extensive experiments demonstrate that HUCLNet significantly outperforms state-of-the-art methods. The dataset and code will be publicly available at: https://github.com/qjh1996/HUTD
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
Disentangling Long-Short Term State Under Unknown Interventions for Online Time Series Forecasting
Authors:
Ruichu Cai,
Haiqin Huang,
Zhifang Jiang,
Zijian Li,
Changze Zhou,
Yuequn Liu,
Yuming Liu,
Zhifeng Hao
Abstract:
Current methods for time series forecasting struggle in the online scenario, since it is difficult to preserve long-term dependency while adapting short-term changes when data are arriving sequentially. Although some recent methods solve this problem by controlling the updates of latent states, they cannot disentangle the long/short-term states, leading to the inability to effectively adapt to non…
▽ More
Current methods for time series forecasting struggle in the online scenario, since it is difficult to preserve long-term dependency while adapting short-term changes when data are arriving sequentially. Although some recent methods solve this problem by controlling the updates of latent states, they cannot disentangle the long/short-term states, leading to the inability to effectively adapt to nonstationary. To tackle this challenge, we propose a general framework to disentangle long/short-term states for online time series forecasting. Our idea is inspired by the observations where short-term changes can be led by unknown interventions like abrupt policies in the stock market. Based on this insight, we formalize a data generation process with unknown interventions on short-term states. Under mild assumptions, we further leverage the independence of short-term states led by unknown interventions to establish the identification theory to achieve the disentanglement of long/short-term states. Built on this theory, we develop a long short-term disentanglement model (LSTD) to extract the long/short-term states with long/short-term encoders, respectively. Furthermore, the LSTD model incorporates a smooth constraint to preserve the long-term dependencies and an interrupted dependency constraint to enforce the forgetting of short-term dependencies, together boosting the disentanglement of long/short-term states. Experimental results on several benchmark datasets show that our \textbf{LSTD} model outperforms existing methods for online time series forecasting, validating its efficacy in real-world applications.
△ Less
Submitted 18 February, 2025;
originally announced February 2025.
-
A Unified Modeling Framework for Automated Penetration Testing
Authors:
Yunfei Wang,
Shixuan Liu,
Wenhao Wang,
Changling Zhou,
Chao Zhang,
Jiandong Jin,
Cheng Zhu
Abstract:
The integration of artificial intelligence into automated penetration testing (AutoPT) has highlighted the necessity of simulation modeling for the training of intelligent agents, due to its cost-efficiency and swift feedback capabilities. Despite the proliferation of AutoPT research, there is a recognized gap in the availability of a unified framework for simulation modeling methods. This paper p…
▽ More
The integration of artificial intelligence into automated penetration testing (AutoPT) has highlighted the necessity of simulation modeling for the training of intelligent agents, due to its cost-efficiency and swift feedback capabilities. Despite the proliferation of AutoPT research, there is a recognized gap in the availability of a unified framework for simulation modeling methods. This paper presents a systematic review and synthesis of existing techniques, introducing MDCPM to categorize studies based on literature objectives, network simulation complexity, dependency of technical and tactical operations, and scenario feedback and variation. To bridge the gap in unified method for multi-dimensional and multi-level simulation modeling, dynamic environment modeling, and the scarcity of public datasets, we introduce AutoPT-Sim, a novel modeling framework that based on policy automation and encompasses the combination of all sub dimensions. AutoPT-Sim offers a comprehensive approach to modeling network environments, attackers, and defenders, transcending the constraints of static modeling and accommodating networks of diverse scales. We publicly release a generated standard network environment dataset and the code of Network Generator. By integrating publicly available datasets flexibly, support is offered for various simulation modeling levels focused on policy automation in MDCPM and the network generator help researchers output customized target network data by adjusting parameters or fine-tuning the network generator.
△ Less
Submitted 17 February, 2025;
originally announced February 2025.
-
CRB-Rate Tradeoff in RSMA-enabled Near-Field Integrated Multi-Target Sensing and Multi-User Communications
Authors:
Jiasi Zhou,
Cong Zhou,
Yanjing Sun,
Chintha Tellambura
Abstract:
Extremely large-scale antenna arrays enhance spectral efficiency and spatial resolution in integrated sensing and communication (ISAC) networks while expanding the Rayleigh distance, triggering a shift from conventional far-field plane waves to near-field (NF) spherical waves. However, full-digital beamforming is infeasible due to the need for dedicated radio frequency (RF) chains. To address this…
▽ More
Extremely large-scale antenna arrays enhance spectral efficiency and spatial resolution in integrated sensing and communication (ISAC) networks while expanding the Rayleigh distance, triggering a shift from conventional far-field plane waves to near-field (NF) spherical waves. However, full-digital beamforming is infeasible due to the need for dedicated radio frequency (RF) chains. To address this, NF-ISAC with a rate-splitting multiple access (RSMA) scheme is developed for advanced interference management, considering fully-connected and partially-connected hybrid analog and digital (HAD) beamforming architectures. Specifically, the Cramér-Rao bound (CRB) for joint distance and angle sensing is derived, and the achievable performance region between the max-min communication rate and the multi-target CRB is defined. To fully characterize the Pareto boundary of the CRB-rate region, a sensing-centric minimization problem is formulated under communication rate constraints for two HAD beamforming architectures. A penalty dual decomposition (PDD)-based double-loop algorithm is developed to optimize fully-connected HAD beamformers. To reduce computational complexity, a two-stage design algorithm for fully connected HAD beamforming is also proposed. Additionally, the PDD-based double-loop algorithm is extended to the partially-connected HAD architecture. Simulations demonstrate the proposed schemes and algorithms: 1) achieve performance comparable to a fully digital beamformer with fewer RF chains, 2) outperform space division multiple access and far-field ISAC, and 3) yield enhanced CRB-rate trade-off performance.
△ Less
Submitted 17 February, 2025;
originally announced February 2025.
-
Learning Surrogate Potential Mean Field Games via Gaussian Processes: A Data-Driven Approach to Ill-Posed Inverse Problems
Authors:
Jingguo Zhang,
Xianjin Yang,
Chenchen Mou,
Chao Zhou
Abstract:
Mean field games (MFGs) describe the collective behavior of large populations of interacting agents. In this work, we tackle ill-posed inverse problems in potential MFGs, aiming to recover the agents' population, momentum, and environmental setup from limited, noisy measurements and partial observations. These problems are ill-posed because multiple MFG configurations can explain the same data, or…
▽ More
Mean field games (MFGs) describe the collective behavior of large populations of interacting agents. In this work, we tackle ill-posed inverse problems in potential MFGs, aiming to recover the agents' population, momentum, and environmental setup from limited, noisy measurements and partial observations. These problems are ill-posed because multiple MFG configurations can explain the same data, or different parameters can yield nearly identical observations. Nonetheless, they remain crucial in practice for real-world scenarios where data are inherently sparse or noisy, or where the MFG structure is not fully determined. Our focus is on finding surrogate MFGs that accurately reproduce the observed data despite these challenges. We propose two Gaussian process (GP)-based frameworks: an inf-sup formulation and a bilevel approach. The choice between them depends on whether the unknown parameters introduce concavity in the objective. In the inf-sup framework, we use the linearity of GPs and their parameterization structure to maintain convex-concave properties, allowing us to apply standard convex optimization algorithms. In the bilevel framework, we employ a gradient-descent-based algorithm and introduce two methods for computing the outer gradient. The first method leverages an existing solver for the inner potential MFG and applies automatic differentiation, while the second adopts an adjoint-based strategy that computes the outer gradient independently of the inner solver. Our numerical experiments show that when sufficient prior information is available, the unknown parameters can be accurately recovered. Otherwise, if prior information is limited, the inverse problem is ill-posed, but our frameworks can still produce surrogate MFG models that closely match observed data.
△ Less
Submitted 17 February, 2025;
originally announced February 2025.
-
DreamDDP: Accelerating Data Parallel Distributed LLM Training with Layer-wise Scheduled Partial Synchronization
Authors:
Zhenheng Tang,
Zichen Tang,
Junlin Huang,
Xinglin Pan,
Rudan Yan,
Yuxin Wang,
Amelie Chi Zhou,
Shaohuai Shi,
Xiaowen Chu,
Bo Li
Abstract:
The growth of large language models (LLMs) increases challenges of accelerating distributed training across multiple GPUs in different data centers. Moreover, concerns about data privacy and data exhaustion have heightened interest in geo-distributed data centers. Communication in geo-distributed data parallel training (DDP) with stochastic gradient descent (S-SGD) is the main bottleneck in low-ba…
▽ More
The growth of large language models (LLMs) increases challenges of accelerating distributed training across multiple GPUs in different data centers. Moreover, concerns about data privacy and data exhaustion have heightened interest in geo-distributed data centers. Communication in geo-distributed data parallel training (DDP) with stochastic gradient descent (S-SGD) is the main bottleneck in low-bandwidth environments. Local SGD mitigates communication overhead by reducing synchronization frequency, and recent studies have successfully applied it to geo-distributedly pre-train LLMs. However, we identify that its model synchronization mechanism prevents overlapping communication and computation, which makes the system lose opportunities to overlap communication and computation.
To overcome this limitation, we expand the design space of local SGD by layer-wisely decoupling model synchronization. In each iteration, only some layers are synchronized instead of the entire model after a specific number of iterations. Leveraging this methodology, we introduce DreamDDP, a training framework to accelerate low-bandwidth distributed training with three key innovations: (1) partial local SGD with theoretical assurances of convergence rates comparable to S-SGD; (2) overlapping parameter synchronization with computation without extra GPU memory occupation; (3) identifying and exploiting three properties to schedule the communication and computation to reduce the training time based on fine-grained profiling of layer-wise communication and computation time. Empirical evaluations conducted on 32 GPUs using prominent deep learning models, including ResNet-18, ResNet-50, GPT-2, and Llama-2, demonstrate that DreamDDP enhances the convergence properties of Local SGD (and Adam) and achieves speedups ranging from $1.49\times$ to $3.91\times$ over leading baseline methods.
△ Less
Submitted 16 February, 2025;
originally announced February 2025.
-
Latent Radiance Fields with 3D-aware 2D Representations
Authors:
Chaoyi Zhou,
Xi Liu,
Feng Luo,
Siyu Huang
Abstract:
Latent 3D reconstruction has shown great promise in empowering 3D semantic understanding and 3D generation by distilling 2D features into the 3D space. However, existing approaches struggle with the domain gap between 2D feature space and 3D representations, resulting in degraded rendering performance. To address this challenge, we propose a novel framework that integrates 3D awareness into the 2D…
▽ More
Latent 3D reconstruction has shown great promise in empowering 3D semantic understanding and 3D generation by distilling 2D features into the 3D space. However, existing approaches struggle with the domain gap between 2D feature space and 3D representations, resulting in degraded rendering performance. To address this challenge, we propose a novel framework that integrates 3D awareness into the 2D latent space. The framework consists of three stages: (1) a correspondence-aware autoencoding method that enhances the 3D consistency of 2D latent representations, (2) a latent radiance field (LRF) that lifts these 3D-aware 2D representations into 3D space, and (3) a VAE-Radiance Field (VAE-RF) alignment strategy that improves image decoding from the rendered 2D representations. Extensive experiments demonstrate that our method outperforms the state-of-the-art latent 3D reconstruction approaches in terms of synthesis performance and cross-dataset generalizability across diverse indoor and outdoor scenes. To our knowledge, this is the first work showing the radiance field representations constructed from 2D latent representations can yield photorealistic 3D reconstruction performance.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
RefineCoder: Iterative Improving of Large Language Models via Adaptive Critique Refinement for Code Generation
Authors:
Changzhi Zhou,
Xinyu Zhang,
Dandan Song,
Xiancai Chen,
Wanli Gu,
Huipeng Ma,
Yuhang Tian,
Mengdi Zhang,
Linmei Hu
Abstract:
Code generation has attracted increasing attention with the rise of Large Language Models (LLMs). Many studies have developed powerful code LLMs by synthesizing code-related instruction data and applying supervised fine-tuning. However, these methods are limited by teacher model distillation and ignore the potential of iterative refinement by self-generated code. In this paper, we propose Adaptive…
▽ More
Code generation has attracted increasing attention with the rise of Large Language Models (LLMs). Many studies have developed powerful code LLMs by synthesizing code-related instruction data and applying supervised fine-tuning. However, these methods are limited by teacher model distillation and ignore the potential of iterative refinement by self-generated code. In this paper, we propose Adaptive Critique Refinement (ACR), which enables the model to refine itself by self-generated code and external critique, rather than directly imitating the code responses of the teacher model. Concretely, ACR includes a composite scoring system with LLM-as-a-Judge to evaluate the quality of code responses and a selective critique strategy with LLM-as-a-Critic to critique self-generated low-quality code responses. We develop the RefineCoder series by iteratively applying ACR, achieving continuous performance improvement on multiple code generation benchmarks. Compared to the baselines of the same size, our proposed RefineCoder series can achieve comparable or even superior performance using less data.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
Precise Measurement of the $χ_{c0}$ Resonance Parameters and Branching Fractions of $χ_{c0,c2}\toπ^+π^-/K^+K^-$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (648 additional authors not shown)
Abstract:
By analyzing a $ψ(3686)$ data sample containing $(107.7\pm0.6)\times10^{6}$ events taken with the BESIII detector at the BEPCII storage ring in 2009, the $χ_{c0}$ resonance parameters are precisely measured using $χ_{c0,c2} \to π^+π^-/K^+K^-$ events. The mass of $χ_{c0}$ is determined to be $M(χ_{c0})=(3415.67\pm0.07\pm0.06\pm0.07$)~MeV/$c^2$, and its full width is…
▽ More
By analyzing a $ψ(3686)$ data sample containing $(107.7\pm0.6)\times10^{6}$ events taken with the BESIII detector at the BEPCII storage ring in 2009, the $χ_{c0}$ resonance parameters are precisely measured using $χ_{c0,c2} \to π^+π^-/K^+K^-$ events. The mass of $χ_{c0}$ is determined to be $M(χ_{c0})=(3415.67\pm0.07\pm0.06\pm0.07$)~MeV/$c^2$, and its full width is $Γ(χ_{c0})=(12.44\pm0.12\pm0.12)~{\rm MeV}$, where the first uncertainty is statistical, the second systematic, and the third for mass comes from $χ_{c2}$ mass uncertainty. These measurements improve the precision of $χ_{c0}$ mass by a factor of four and width by one order of magnitude over the previous individual measurements, and significantly boost our knowledge about the charmonium spectrum. Together with additional $(345.4\pm2.6)\times10^{6}$ $ψ(3686)$ data events taken in 2012, the decay branching fractions of $χ_{c0,c2}\toπ^+π^-/K^+K^-$ are measured as well, with precision improved by a factor of three compared to previous measurements. These $χ_{c0}$ decay branching fractions provide important inputs for the study of glueballs.
△ Less
Submitted 1 July, 2025; v1 submitted 12 February, 2025;
originally announced February 2025.
-
GCoT: Chain-of-Thought Prompt Learning for Graphs
Authors:
Xingtong Yu,
Chang Zhou,
Zhongwei Kuai,
Xinming Zhang,
Yuan Fang
Abstract:
Chain-of-thought (CoT) prompting has achieved remarkable success in natural language processing (NLP). However, its vast potential remains largely unexplored for graphs. This raises an interesting question: How can we design CoT prompting for graphs to guide graph models to learn step by step? On one hand, unlike natural languages, graphs are non-linear and characterized by complex topological str…
▽ More
Chain-of-thought (CoT) prompting has achieved remarkable success in natural language processing (NLP). However, its vast potential remains largely unexplored for graphs. This raises an interesting question: How can we design CoT prompting for graphs to guide graph models to learn step by step? On one hand, unlike natural languages, graphs are non-linear and characterized by complex topological structures. On the other hand, many graphs lack textual data, making it difficult to formulate language-based CoT prompting. In this work, we propose the first CoT prompt learning framework for text-free graphs, GCoT. Specifically, we decompose the adaptation process for each downstream task into a series of inference steps, with each step consisting of prompt-based inference, ``thought'' generation, and thought-conditioned prompt learning. While the steps mimic CoT prompting in NLP, the exact mechanism differs significantly. Specifically, at each step, an input graph, along with a prompt, is first fed into a pre-trained graph encoder for prompt-based inference. We then aggregate the hidden layers of the encoder to construct a ``thought'', which captures the working state of each node in the current step. Conditioned on this thought, we learn a prompt specific to each node based on the current state. These prompts are fed into the next inference step, repeating the cycle. To evaluate and analyze the effectiveness of GCoT, we conduct comprehensive experiments on eight public datasets, which demonstrate the advantage of our approach.
△ Less
Submitted 2 June, 2025; v1 submitted 11 February, 2025;
originally announced February 2025.
-
Testing spooky action between free-traveling electron-positron pairs
Authors:
Leyun Gao,
Alim Ruzi,
Qite Li,
Chen Zhou,
Qiang Li
Abstract:
Quantum entanglement is a cornerstone of quantum mechanics. While the entanglement of confined electron pairs has been established early on, the entanglement of free-traveling electron pairs, particularly at high energies, remains largely unexplored due to the substantial challenges involved in measuring the spins of free-traveling electrons. In this study, we investigate the entanglement and the…
▽ More
Quantum entanglement is a cornerstone of quantum mechanics. While the entanglement of confined electron pairs has been established early on, the entanglement of free-traveling electron pairs, particularly at high energies, remains largely unexplored due to the substantial challenges involved in measuring the spins of free-traveling electrons. In this study, we investigate the entanglement and the Bell inequality violation of free-traveling electron-positron pairs generated in a fixed-target experiment. This experimental setup facilitates the creation of a controllable source of entangled electron-positron pairs, where entangled events are produced in specific phase spaces. Based on this source and the prior knowledge of the entangled state, we demonstrate the feasibility of measuring the polarization correlations of the entangled $e^+e^-$ pairs through their individual secondary scatterings off two separate additional targets.
△ Less
Submitted 11 February, 2025;
originally announced February 2025.
-
Less is More: Masking Elements in Image Condition Features Avoids Content Leakages in Style Transfer Diffusion Models
Authors:
Lin Zhu,
Xinbing Wang,
Chenghu Zhou,
Qinying Gu,
Nanyang Ye
Abstract:
Given a style-reference image as the additional image condition, text-to-image diffusion models have demonstrated impressive capabilities in generating images that possess the content of text prompts while adopting the visual style of the reference image. However, current state-of-the-art methods often struggle to disentangle content and style from style-reference images, leading to issues such as…
▽ More
Given a style-reference image as the additional image condition, text-to-image diffusion models have demonstrated impressive capabilities in generating images that possess the content of text prompts while adopting the visual style of the reference image. However, current state-of-the-art methods often struggle to disentangle content and style from style-reference images, leading to issues such as content leakages. To address this issue, we propose a masking-based method that efficiently decouples content from style without the need of tuning any model parameters. By simply masking specific elements in the style reference's image features, we uncover a critical yet under-explored principle: guiding with appropriately-selected fewer conditions (e.g., dropping several image feature elements) can efficiently avoid unwanted content flowing into the diffusion models, enhancing the style transfer performances of text-to-image diffusion models. In this paper, we validate this finding both theoretically and experimentally. Extensive experiments across various styles demonstrate the effectiveness of our masking-based method and support our theoretical results.
△ Less
Submitted 11 February, 2025;
originally announced February 2025.
-
Search for $e^+e^-\to K_S^0 K_S^0 h_c$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (642 additional authors not shown)
Abstract:
Using $e^+e^-$ collision data at 13 center-of-mass energies ranging from 4.600 to 4.950 GeV collected with the BESIII detector, we search for the unmeasured $e^+e^-\to K_S^0 K_S^0 h_c$ process . No significant signal is observed, and the upper limits of the Born cross sections at each center-of-mass energy are presented.
Using $e^+e^-$ collision data at 13 center-of-mass energies ranging from 4.600 to 4.950 GeV collected with the BESIII detector, we search for the unmeasured $e^+e^-\to K_S^0 K_S^0 h_c$ process . No significant signal is observed, and the upper limits of the Born cross sections at each center-of-mass energy are presented.
△ Less
Submitted 27 May, 2025; v1 submitted 11 February, 2025;
originally announced February 2025.
-
Prot2Chat: Protein LLM with Early-Fusion of Text, Sequence and Structure
Authors:
Zhicong Wang,
Zicheng Ma,
Ziqiang Cao,
Changlong Zhou,
Jun Zhang,
Yiqin Gao
Abstract:
Motivation: Proteins are of great significance in living organisms. However, understanding their functions encounters numerous challenges, such as insufficient integration of multimodal information, a large number of training parameters, limited flexibility of classification-based methods, and the lack of systematic evaluation metrics for protein Q&A systems. To tackle these issues, we propose the…
▽ More
Motivation: Proteins are of great significance in living organisms. However, understanding their functions encounters numerous challenges, such as insufficient integration of multimodal information, a large number of training parameters, limited flexibility of classification-based methods, and the lack of systematic evaluation metrics for protein Q&A systems. To tackle these issues, we propose the Prot2Chat framework. Results: We modified ProteinMPNN to encode protein sequence and structural information in a unified way. We used a large language model (LLM) to encode questions into vectors and developed a protein-text adapter to compress protein information into virtual tokens based on these vectors, achieving the early fusion of text and protein information. Finally, the same LLM reads the virtual tokens and the questions to generate answers. To optimize training efficiency, we froze the encoder and employed Low-Rank Adaptation (LoRA) techniques for the LLM. Experiments on two datasets show that both automated metrics and expert evaluations demonstrate the superior performance of our model, and zero-shot prediction results highlight its generalization ability. The models and codes are available at https://github.com/ wangzc1233/Prot2Chat. Contact: [email protected] or [email protected] Key words: Protein Q&A, Early-Fusion, LLM
△ Less
Submitted 22 May, 2025; v1 submitted 7 February, 2025;
originally announced February 2025.
-
SAMGPT: Text-free Graph Foundation Model for Multi-domain Pre-training and Cross-domain Adaptation
Authors:
Xingtong Yu,
Zechuan Gong,
Chang Zhou,
Yuan Fang,
Hui Zhang
Abstract:
Graphs are able to model interconnected entities in many online services, supporting a wide range of applications on the Web. This raises an important question: How can we train a graph foundational model on multiple source domains and adapt to an unseen target domain? A major obstacle is that graphs from different domains often exhibit divergent characteristics. Some studies leverage large langua…
▽ More
Graphs are able to model interconnected entities in many online services, supporting a wide range of applications on the Web. This raises an important question: How can we train a graph foundational model on multiple source domains and adapt to an unseen target domain? A major obstacle is that graphs from different domains often exhibit divergent characteristics. Some studies leverage large language models to align multiple domains based on textual descriptions associated with the graphs, limiting their applicability to text-attributed graphs. For text-free graphs, a few recent works attempt to align different feature distributions across domains, while generally neglecting structural differences. In this work, we propose a novel Structure Alignment framework for text-free Multi-domain Graph Pre-Training and cross-domain adaptation (SAMGPT). It is designed to learn multi-domain knowledge from graphs originating in multiple source domains, which can then be adapted to address applications in an unseen target domain. Specifically, we introduce a set of structure tokens to harmonize structure-based aggregation across source domains during the pre-training phase. Next, for cross-domain adaptation, we design dual prompts, namely, holistic prompts and specific prompts, which adapt unified multi-domain structural knowledge and fine-grained, domain-specific information, respectively, to a target domain. Finally, we conduct comprehensive experiments on seven public datasets to evaluate and analyze the effectiveness of SAMGPT.
△ Less
Submitted 12 April, 2025; v1 submitted 7 February, 2025;
originally announced February 2025.
-
Relationship between 2D and 3D Galaxy Stellar Mass and Correlations with Halo Mass
Authors:
Conghao Zhou,
Alexie Leauthaud,
Shuo Xu,
Benedikt Diemer,
Song Huang,
Katya Leidig,
Tesla Jeltema,
Marco Gatti,
Yifei Luo,
Carlo Cannarozzo,
Sven Heydenreich
Abstract:
Recent studies suggest that the stars in the outer regions of massive galaxies trace halo mass better than the inner regions and that an annular stellar mass provides a low scatter method of selecting galaxy clusters. However, we can only observe galaxies as projected two-dimensional objects on the sky. In this paper, we use a sample of simulated galaxies to study how well galaxy stellar mass prof…
▽ More
Recent studies suggest that the stars in the outer regions of massive galaxies trace halo mass better than the inner regions and that an annular stellar mass provides a low scatter method of selecting galaxy clusters. However, we can only observe galaxies as projected two-dimensional objects on the sky. In this paper, we use a sample of simulated galaxies to study how well galaxy stellar mass profiles in three dimensions correlate with halo mass, and what effects arise when observationally projecting stellar profiles into two dimensions. We compare 2D and 3D outer stellar mass selections and find that they have similar performance as halo mass proxies and that, surprisingly, a 2D selection sometimes has marginally better performance. We also investigate whether the weak lensing profiles around galaxies selected by 2D outer stellar mass suffer from projection effects. We find that the lensing profiles of samples selected by 2D and 3D definitions are nearly identical, suggesting that the 2D selection does not create a bias. These findings underscore the promise of using outer stellar mass as a tool for identifying galaxy clusters.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
Observation of $D\to \bar{K}_{1}(1270)μ^+ν_μ$ and test of lepton flavor universality with $D\to \bar{K}_1(1270) \ell^{+} ν_{\ell}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (646 additional authors not shown)
Abstract:
By analyzing 7.93 $\rm fb^{-1}$ of $e^+e^-$ collision data collected at the center-of-mass energy of 3.773 GeV with the BESIII detector operated at the BEPCII collider, we report the observation of the semimuonic decays of $D^+\to \bar K_1(1270)^0μ^+ν_μ$ and $D^0\to K_1(1270)^-μ^+ν_μ$ with statistical significances of $12.5σ$ and $6.0σ$, respectively. Their decay branching fractions are determined…
▽ More
By analyzing 7.93 $\rm fb^{-1}$ of $e^+e^-$ collision data collected at the center-of-mass energy of 3.773 GeV with the BESIII detector operated at the BEPCII collider, we report the observation of the semimuonic decays of $D^+\to \bar K_1(1270)^0μ^+ν_μ$ and $D^0\to K_1(1270)^-μ^+ν_μ$ with statistical significances of $12.5σ$ and $6.0σ$, respectively. Their decay branching fractions are determined to be ${\mathcal B}[D^{+}\to \bar{K}_1(1270)^0 μ^{+}ν_μ]=(2.36\pm0.20^{+0.18}_{-0.27}\pm 0.48)\times10^{-3}$ and ${\mathcal B}[D^{0}\to K_1(1270)^{-} μ^{+}ν_μ]=(0.78\pm0.11^{+0.05}_{-0.09}\pm 0.15)\times10^{-3}$, where the first and second uncertainties are statistical and systematic, respectively, and the third originates from the input branching fraction of $\bar K_{1}(1270)^0\to K^- π^+π^0$ or $K_1(1270)^-\to K^-π^+π^-$. Combining our branching fractions with the previous measurements of ${\mathcal B}[D^+\to \bar K_1(1270)^0e^+ν_{e}]$ and ${\mathcal B}[D^0\to K_1(1270)^-e^+ν_{e}]$, we determine the branching fraction ratios to be ${\mathcal B}[D^+\to \bar K_1(1270)^0μ^+ν_μ]/{\mathcal B}[D^+\to \bar K_1(1270)^0e^+ν_{e}]=1.03 \pm 0.14 \substack{+0.11\\-0.15}$ and ${\mathcal B}[D^0\to K_1(1270)^-μ^+ν_μ]/{\mathcal B}[D^0\to K_1(1270)^-e^+ν_{e}]=0.74\pm 0.13 \substack{+0.08\\-0.13}$. Using the branching fractions measured in this work and the world-average lifetimes of the $D^+$ and $D^0$ mesons, we determine the semimuonic partial decay width ratio to be $Γ[D^+\to \bar K_1(1270)^0 μ^+ν_μ]/Γ[D^0\to K_1(1270)^- μ^+ν_μ]=1.22\pm 0.10\substack{+0.06\\-0.09}$, which is consistent with unity as predicted by isospin conservation.
△ Less
Submitted 18 April, 2025; v1 submitted 6 February, 2025;
originally announced February 2025.
-
OphthBench: A Comprehensive Benchmark for Evaluating Large Language Models in Chinese Ophthalmology
Authors:
Chengfeng Zhou,
Ji Wang,
Juanjuan Qin,
Yining Wang,
Ling Sun,
Weiwei Dai
Abstract:
Large language models (LLMs) have shown significant promise across various medical applications, with ophthalmology being a notable area of focus. Many ophthalmic tasks have shown substantial improvement through the integration of LLMs. However, before these models can be widely adopted in clinical practice, evaluating their capabilities and identifying their limitations is crucial. To address thi…
▽ More
Large language models (LLMs) have shown significant promise across various medical applications, with ophthalmology being a notable area of focus. Many ophthalmic tasks have shown substantial improvement through the integration of LLMs. However, before these models can be widely adopted in clinical practice, evaluating their capabilities and identifying their limitations is crucial. To address this research gap and support the real-world application of LLMs, we introduce the OphthBench, a specialized benchmark designed to assess LLM performance within the context of Chinese ophthalmic practices. This benchmark systematically divides a typical ophthalmic clinical workflow into five key scenarios: Education, Triage, Diagnosis, Treatment, and Prognosis. For each scenario, we developed multiple tasks featuring diverse question types, resulting in a comprehensive benchmark comprising 9 tasks and 591 questions. This comprehensive framework allows for a thorough assessment of LLMs' capabilities and provides insights into their practical application in Chinese ophthalmology. Using this benchmark, we conducted extensive experiments and analyzed the results from 39 popular LLMs. Our evaluation highlights the current gap between LLM development and its practical utility in clinical settings, providing a clear direction for future advancements. By bridging this gap, we aim to unlock the potential of LLMs and advance their development in ophthalmology.
△ Less
Submitted 3 February, 2025;
originally announced February 2025.
-
Dual Alignment Maximin Optimization for Offline Model-based RL
Authors:
Chi Zhou,
Wang Luo,
Haoran Li,
Congying Han,
Tiande Guo,
Zicheng Zhang
Abstract:
Offline reinforcement learning agents face significant deployment challenges due to the synthetic-to-real distribution mismatch. While most prior research has focused on improving the fidelity of synthetic sampling and incorporating off-policy mechanisms, the directly integrated paradigm often fails to ensure consistent policy behavior in biased models and underlying environmental dynamics, which…
▽ More
Offline reinforcement learning agents face significant deployment challenges due to the synthetic-to-real distribution mismatch. While most prior research has focused on improving the fidelity of synthetic sampling and incorporating off-policy mechanisms, the directly integrated paradigm often fails to ensure consistent policy behavior in biased models and underlying environmental dynamics, which inherently arise from discrepancies between behavior and learning policies. In this paper, we first shift the focus from model reliability to policy discrepancies while optimizing for expected returns, and then self-consistently incorporate synthetic data, deriving a novel actor-critic paradigm, Dual Alignment Maximin Optimization (DAMO). It is a unified framework to ensure both model-environment policy consistency and synthetic and offline data compatibility. The inner minimization performs dual conservative value estimation, aligning policies and trajectories to avoid out-of-distribution states and actions, while the outer maximization ensures that policy improvements remain consistent with inner value estimates. Empirical evaluations demonstrate that DAMO effectively ensures model and policy alignments, achieving competitive performance across diverse benchmark tasks.
△ Less
Submitted 10 May, 2025; v1 submitted 2 February, 2025;
originally announced February 2025.
-
Fantastic Multi-Task Gradient Updates and How to Find Them In a Cone
Authors:
Negar Hassanpour,
Muhammad Kamran Janjua,
Kunlin Zhang,
Sepehr Lavasani,
Xiaowen Zhang,
Chunhua Zhou,
Chao Gao
Abstract:
Balancing competing objectives remains a fundamental challenge in multi-task learning (MTL), primarily due to conflicting gradients across individual tasks. A common solution relies on computing a dynamic gradient update vector that balances competing tasks as optimization progresses. Building on this idea, we propose ConicGrad, a principled, scalable, and robust MTL approach formulated as a const…
▽ More
Balancing competing objectives remains a fundamental challenge in multi-task learning (MTL), primarily due to conflicting gradients across individual tasks. A common solution relies on computing a dynamic gradient update vector that balances competing tasks as optimization progresses. Building on this idea, we propose ConicGrad, a principled, scalable, and robust MTL approach formulated as a constrained optimization problem. Our method introduces an angular constraint to dynamically regulate gradient update directions, confining them within a cone centered on the reference gradient of the overall objective. By balancing task-specific gradients without over-constraining their direction or magnitude, ConicGrad effectively resolves inter-task gradient conflicts. Moreover, our framework ensures computational efficiency and scalability to high-dimensional parameter spaces. We conduct extensive experiments on standard supervised learning and reinforcement learning MTL benchmarks, and demonstrate that ConicGrad achieves state-of-the-art performance across diverse tasks.
△ Less
Submitted 31 January, 2025;
originally announced February 2025.
-
Subtle variations in stiff dimensions of brain networks account for individual differences in cognitive ability
Authors:
Sida Chen,
Qianyuan Tang,
Taro Toyoizumi,
Werner Sommer,
Lianchun Yu,
Changsong Zhou
Abstract:
Explaining individual differences in cognitive abilities requires both identifying brain parameters that vary across individuals and understanding how brain networks are recruited for specific tasks. Typically, task performance relies on the integration and segregation of functional subnetworks, often captured by parameters like regional excitability and connectivity. Yet, the high dimensionality…
▽ More
Explaining individual differences in cognitive abilities requires both identifying brain parameters that vary across individuals and understanding how brain networks are recruited for specific tasks. Typically, task performance relies on the integration and segregation of functional subnetworks, often captured by parameters like regional excitability and connectivity. Yet, the high dimensionality of these parameters hinders pinpointing their functional relevance. Here, we apply stiff-sloppy analysis to human brain data, revealing that certain subtle parameter combinations ("stiff dimensions") powerfully influence neural activity during task processing, whereas others ("sloppy dimensions") vary more extensively but exert minimal impact. Using a pairwise maximum entropy model of task fMRI, we show that even small deviations in stiff dimensions-derived through Fisher Information Matrix analysis-govern the dynamic interplay of segregation and integration between the default mode network (DMN) and a working memory network (WMN). Crucially, separating a 0-back task (vigilant attention) from a 2-back task (working memory updating) uncovers partially distinct stiff dimensions predicting performance in each condition, along with a global DMN-WMN segregation shared across both tasks. Altogether, stiff-sloppy analysis challenges the conventional focus on large parameter variability by highlighting these subtle yet functionally decisive parameter combinations.
△ Less
Submitted 27 April, 2025; v1 submitted 31 January, 2025;
originally announced January 2025.
-
Visual Autoregressive Modeling for Image Super-Resolution
Authors:
Yunpeng Qu,
Kun Yuan,
Jinhua Hao,
Kai Zhao,
Qizhi Xie,
Ming Sun,
Chao Zhou
Abstract:
Image Super-Resolution (ISR) has seen significant progress with the introduction of remarkable generative models. However, challenges such as the trade-off issues between fidelity and realism, as well as computational complexity, have also posed limitations on their application. Building upon the tremendous success of autoregressive models in the language domain, we propose \textbf{VARSR}, a novel…
▽ More
Image Super-Resolution (ISR) has seen significant progress with the introduction of remarkable generative models. However, challenges such as the trade-off issues between fidelity and realism, as well as computational complexity, have also posed limitations on their application. Building upon the tremendous success of autoregressive models in the language domain, we propose \textbf{VARSR}, a novel visual autoregressive modeling for ISR framework with the form of next-scale prediction. To effectively integrate and preserve semantic information in low-resolution images, we propose using prefix tokens to incorporate the condition. Scale-aligned Rotary Positional Encodings are introduced to capture spatial structures and the diffusion refiner is utilized for modeling quantization residual loss to achieve pixel-level fidelity. Image-based Classifier-free Guidance is proposed to guide the generation of more realistic images. Furthermore, we collect large-scale data and design a training process to obtain robust generative priors. Quantitative and qualitative results show that VARSR is capable of generating high-fidelity and high-realism images with more efficiency than diffusion-based methods. Our codes will be released at https://github.com/qyp2000/VARSR.
△ Less
Submitted 31 January, 2025;
originally announced January 2025.
-
Equivariant Hypergraph Diffusion for Crystal Structure Prediction
Authors:
Yang Liu,
Chuan Zhou,
Shuai Zhang,
Peng Zhang,
Xixun Lin,
Shirui Pan
Abstract:
Crystal Structure Prediction (CSP) remains a fundamental challenge with significant implications for the development of new materials and the advancement of various scientific disciplines. Recent developments have shown that generative models, particularly diffusion models, hold great promise for CSP. However, traditional graph-based representations, where atomic bonds are modeled as pairwise grap…
▽ More
Crystal Structure Prediction (CSP) remains a fundamental challenge with significant implications for the development of new materials and the advancement of various scientific disciplines. Recent developments have shown that generative models, particularly diffusion models, hold great promise for CSP. However, traditional graph-based representations, where atomic bonds are modeled as pairwise graph edges, fail to fully capture the intricate high-order interactions essential for accurately representing crystal structures. In this work, we propose a novel approach that utilizes hypergraphs to represent crystal structures, providing a more expressive abstraction for modeling multi-way atomic interactions. By adopting hypergraphs, we can effectively capture complex high-order relationships and symmetries, such as permutation and periodic translation invariance, which are crucial for characterizing crystal structures. In this work, we propose the \textbf{E}quivariant \textbf{H}ypergraph \textbf{Diff}usion Model (\textbf{EH-Diff}), a generative model designed to take advantage of the symmetry-preserving properties of hypergraphs. EH-Diff exploits these features to offer an efficient and accurate method for predicting crystal structures with a strong theoretical justification to preserve invariance properties. Empirically, we conduct extensive experiments on four benchmark datasets, and the results demonstrate that EH-Diff outperforms state-of-the-art CSP methods with only one sample.
△ Less
Submitted 30 January, 2025;
originally announced January 2025.
-
Reinforcement-Learning Portfolio Allocation with Dynamic Embedding of Market Information
Authors:
Jinghai He,
Cheng Hua,
Chunyang Zhou,
Zeyu Zheng
Abstract:
We develop a portfolio allocation framework that leverages deep learning techniques to address challenges arising from high-dimensional, non-stationary, and low-signal-to-noise market information. Our approach includes a dynamic embedding method that reduces the non-stationary, high-dimensional state space into a lower-dimensional representation. We design a reinforcement learning (RL) framework t…
▽ More
We develop a portfolio allocation framework that leverages deep learning techniques to address challenges arising from high-dimensional, non-stationary, and low-signal-to-noise market information. Our approach includes a dynamic embedding method that reduces the non-stationary, high-dimensional state space into a lower-dimensional representation. We design a reinforcement learning (RL) framework that integrates generative autoencoders and online meta-learning to dynamically embed market information, enabling the RL agent to focus on the most impactful parts of the state space for portfolio allocation decisions. Empirical analysis based on the top 500 U.S. stocks demonstrates that our framework outperforms common portfolio benchmarks and the predict-then-optimize (PTO) approach using machine learning, particularly during periods of market stress. Traditional factor models do not fully explain this superior performance. The framework's ability to time volatility reduces its market exposure during turbulent times. Ablation studies confirm the robustness of this performance across various reinforcement learning algorithms. Additionally, the embedding and meta-learning techniques effectively manage the complexities of high-dimensional, noisy, and non-stationary financial data, enhancing both portfolio performance and risk management.
△ Less
Submitted 29 January, 2025;
originally announced January 2025.
-
Neural Spelling: A Spell-Based BCI System for Language Neural Decoding
Authors:
Xiaowei Jiang,
Charles Zhou,
Yiqun Duan,
Ziyi Zhao,
Thomas Do,
Chin-Teng Lin
Abstract:
Brain-computer interfaces (BCIs) present a promising avenue by translating neural activity directly into text, eliminating the need for physical actions. However, existing non-invasive BCI systems have not successfully covered the entire alphabet, limiting their practicality. In this paper, we propose a novel non-invasive EEG-based BCI system with Curriculum-based Neural Spelling Framework, which…
▽ More
Brain-computer interfaces (BCIs) present a promising avenue by translating neural activity directly into text, eliminating the need for physical actions. However, existing non-invasive BCI systems have not successfully covered the entire alphabet, limiting their practicality. In this paper, we propose a novel non-invasive EEG-based BCI system with Curriculum-based Neural Spelling Framework, which recognizes all 26 alphabet letters by decoding neural signals associated with handwriting first, and then apply a Generative AI (GenAI) to enhance spell-based neural language decoding tasks. Our approach combines the ease of handwriting with the accessibility of EEG technology, utilizing advanced neural decoding algorithms and pre-trained large language models (LLMs) to translate EEG patterns into text with high accuracy. This system show how GenAI can improve the performance of typical spelling-based neural language decoding task, and addresses the limitations of previous methods, offering a scalable and user-friendly solution for individuals with communication impairments, thereby enhancing inclusive communication options.
△ Less
Submitted 29 January, 2025;
originally announced January 2025.
-
One Head Eight Arms: Block Matrix based Low Rank Adaptation for CLIP-based Few-Shot Learning
Authors:
Chunpeng Zhou,
Qianqian Shen,
Zhi Yu,
Jiajun Bu,
Haishuai Wang
Abstract:
Recent advancements in fine-tuning Vision-Language Foundation Models (VLMs) have garnered significant attention for their effectiveness in downstream few-shot learning tasks.While these recent approaches exhibits some performance improvements, they often suffer from excessive training parameters and high computational costs. To address these challenges, we propose a novel Block matrix-based low-ra…
▽ More
Recent advancements in fine-tuning Vision-Language Foundation Models (VLMs) have garnered significant attention for their effectiveness in downstream few-shot learning tasks.While these recent approaches exhibits some performance improvements, they often suffer from excessive training parameters and high computational costs. To address these challenges, we propose a novel Block matrix-based low-rank adaptation framework, called Block-LoRA, for fine-tuning VLMs on downstream few-shot tasks. Inspired by recent work on Low-Rank Adaptation (LoRA), Block-LoRA partitions the original low-rank decomposition matrix of LoRA into a series of sub-matrices while sharing all down-projection sub-matrices. This structure not only reduces the number of training parameters, but also transforms certain complex matrix multiplication operations into simpler matrix addition, significantly lowering the computational cost of fine-tuning. Notably, Block-LoRA enables fine-tuning CLIP on the ImageNet few-shot benchmark using a single 24GB GPU. We also show that Block-LoRA has the more tighter bound of generalization error than vanilla LoRA. Without bells and whistles, extensive experiments demonstrate that Block-LoRA achieves competitive performance compared to state-of-the-art CLIP-based few-shot methods, while maintaining a low training parameters count and reduced computational overhead.
△ Less
Submitted 28 January, 2025;
originally announced January 2025.
-
UNIDOOR: A Universal Framework for Action-Level Backdoor Attacks in Deep Reinforcement Learning
Authors:
Oubo Ma,
Linkang Du,
Yang Dai,
Chunyi Zhou,
Qingming Li,
Yuwen Pu,
Shouling Ji
Abstract:
Deep reinforcement learning (DRL) is widely applied to safety-critical decision-making scenarios. However, DRL is vulnerable to backdoor attacks, especially action-level backdoors, which pose significant threats through precise manipulation and flexible activation, risking outcomes like vehicle collisions or drone crashes. The key distinction of action-level backdoors lies in the utilization of th…
▽ More
Deep reinforcement learning (DRL) is widely applied to safety-critical decision-making scenarios. However, DRL is vulnerable to backdoor attacks, especially action-level backdoors, which pose significant threats through precise manipulation and flexible activation, risking outcomes like vehicle collisions or drone crashes. The key distinction of action-level backdoors lies in the utilization of the backdoor reward function to associate triggers with target actions. Nevertheless, existing studies typically rely on backdoor reward functions with fixed values or conditional flipping, which lack universality across diverse DRL tasks and backdoor designs, resulting in fluctuations or even failure in practice.
This paper proposes the first universal action-level backdoor attack framework, called UNIDOOR, which enables adaptive exploration of backdoor reward functions through performance monitoring, eliminating the reliance on expert knowledge and grid search. We highlight that action tampering serves as a crucial component of action-level backdoor attacks in continuous action scenarios, as it addresses attack failures caused by low-frequency target actions. Extensive evaluations demonstrate that UNIDOOR significantly enhances the attack performance of action-level backdoors, showcasing its universality across diverse attack scenarios, including single/multiple agents, single/multiple backdoors, discrete/continuous action spaces, and sparse/dense reward signals. Furthermore, visualization results encompassing state distribution, neuron activation, and animations demonstrate the stealthiness of UNIDOOR. The source code of UNIDOOR can be found at https://github.com/maoubo/UNIDOOR.
△ Less
Submitted 26 January, 2025;
originally announced January 2025.
-
Observation of $h_{c}$ radiative decays to multiple light hadrons and the tensor state $f_2(1270)$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (666 additional authors not shown)
Abstract:
Using $ψ(3686)\rightarrow π^{0} h_{c}$ decays from a data sample of $(27.12\pm0.14)\times10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the BEPCII collider, $h_c$ radiative decays to $γπ^{+}π^{-},~γπ^{+}π^{-}η,~\gamma2(π^{+}π^{-})$, and $γp\bar{p}$ are observed for the first time, each with a significance greater than $5σ$. The corresponding branching fractions are measured. Furtherm…
▽ More
Using $ψ(3686)\rightarrow π^{0} h_{c}$ decays from a data sample of $(27.12\pm0.14)\times10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the BEPCII collider, $h_c$ radiative decays to $γπ^{+}π^{-},~γπ^{+}π^{-}η,~\gamma2(π^{+}π^{-})$, and $γp\bar{p}$ are observed for the first time, each with a significance greater than $5σ$. The corresponding branching fractions are measured. Furthermore, intermediate states below 2.8 GeV/$c^{2}$ are investigated, leading to the first observation of the decay process of $h_c\rightarrowγf_{2}(1270)\rightarrowγπ^{+}π^{-}$ with a significance of $5.5\,σ$. This observation represents the first instance of $h_c$ radiative decay to a tensor state.
△ Less
Submitted 26 January, 2025;
originally announced January 2025.
-
Dynamic Adaptation in Data Storage: Real-Time Machine Learning for Enhanced Prefetching
Authors:
Chiyu Cheng,
Chang Zhou,
Yang Zhao,
Jin Cao
Abstract:
The exponential growth of data storage demands has necessitated the evolution of hierarchical storage management strategies [1]. This study explores the application of streaming machine learning [3] to revolutionize data prefetching within multi-tiered storage systems. Unlike traditional batch-trained models, streaming machine learning [5] offers adaptability, real-time insights, and computational…
▽ More
The exponential growth of data storage demands has necessitated the evolution of hierarchical storage management strategies [1]. This study explores the application of streaming machine learning [3] to revolutionize data prefetching within multi-tiered storage systems. Unlike traditional batch-trained models, streaming machine learning [5] offers adaptability, real-time insights, and computational efficiency, responding dynamically to workload variations. This work designs and validates an innovative framework that integrates streaming classification models for predicting file access patterns, specifically the next file offset. Leveraging comprehensive feature engineering and real-time evaluation over extensive production traces, the proposed methodology achieves substantial improvements in prediction accuracy, memory efficiency, and system adaptability. The results underscore the potential of streaming models in real-time storage management, setting a precedent for advanced caching and tiering strategies.
△ Less
Submitted 28 January, 2025; v1 submitted 29 December, 2024;
originally announced January 2025.
-
Optimizing SSD Caches for Cloud Block Storage Systems Using Machine Learning Approaches
Authors:
Chiyu Cheng,
Chang Zhou,
Yang Zhao,
Jin Cao
Abstract:
The growing demand for efficient cloud storage solutions has led to the widespread adoption of Solid-State Drives (SSDs) for caching in cloud block storage systems. The management of data writes to SSD caches plays a crucial role in improving overall system performance, reducing latency, and extending the lifespan of storage devices. A critical challenge arises from the large volume of write-only…
▽ More
The growing demand for efficient cloud storage solutions has led to the widespread adoption of Solid-State Drives (SSDs) for caching in cloud block storage systems. The management of data writes to SSD caches plays a crucial role in improving overall system performance, reducing latency, and extending the lifespan of storage devices. A critical challenge arises from the large volume of write-only data, which significantly impacts the performance of SSD caches when handled inefficiently. Specifically, writes that have not been read for a certain period may introduce unnecessary write traffic to the SSD cache without offering substantial benefits for cache performance. This paper proposes a novel approach to mitigate this issue by leveraging machine learning techniques to dynamically optimize the write policy in cloud-based storage systems. The proposed method identifies write-only data and selectively filters it out in real-time, thereby minimizing the number of unnecessary write operations and improving the overall performance of the cache system. Experimental results demonstrate that the proposed machine learning-based policy significantly outperforms traditional approaches by reducing the number of harmful writes and optimizing cache utilization. This solution is particularly suitable for cloud environments with varying and unpredictable workloads, where traditional cache management strategies often fall short.
△ Less
Submitted 28 January, 2025; v1 submitted 29 December, 2024;
originally announced January 2025.
-
Comprehensive Analog Signal Processing Platform Enabled with Acoustic Charge Transport in Two-dimensional Materials
Authors:
Yueyi Sun,
Siming Liu,
Yingjie Luo,
Jiwei Chen,
Yihong Sun,
Changjian Zhou
Abstract:
Two-dimensional Acoustic Charge Transport (2D-ACT) devices, which integrate two dimensional semiconductor field-effect transistor (FET) with high-frequency surface acoustic wave (SAW) device provide a potential compact platform for the processing of analog signals in a wireless, non-contact, low-loss and real-time way. It is expected to be used in long-distance space communication and sensing. How…
▽ More
Two-dimensional Acoustic Charge Transport (2D-ACT) devices, which integrate two dimensional semiconductor field-effect transistor (FET) with high-frequency surface acoustic wave (SAW) device provide a potential compact platform for the processing of analog signals in a wireless, non-contact, low-loss and real-time way. It is expected to be used in long-distance space communication and sensing. However, current investigations into 2D-ACT devices are still limited to the observation of DC acoustoelectric currents, and have yet to achieve real-time electronic signal processing capabilities. In this paper, we have designed a hybrid acoustoelectric platform composed of two-dimensional semiconductor FET and SAW device. The platform is capable of processing DC signals, exhibiting ambipolar transport behavior. The sub-wavelength channel length of the FET within the platform allows for the real-time observation of carrier distribution at a microscopic scale in conjunction with the SAW potential, and facilitating the reproduction and intensity regulation of AC signals. By adjusting the relative phase and intensity ratio of two counter-propagating SAWs, the platform also enables the addition and subtraction of AC signals.
△ Less
Submitted 27 January, 2025; v1 submitted 23 January, 2025;
originally announced January 2025.
-
Revisit Self-Debugging with Self-Generated Tests for Code Generation
Authors:
Xiancai Chen,
Zhengwei Tao,
Kechi Zhang,
Changzhi Zhou,
Wanli Gu,
Yuanpeng He,
Mengdi Zhang,
Xunliang Cai,
Haiyan Zhao,
Zhi Jin
Abstract:
Large language models (LLMs) have shown significant advancements in code generation, but still face challenges on tasks beyond their basic capabilities. Recently, the notion of self-debugging has been proposed to boost the performance of code generation by leveraging execution feedback from tests. Despite its promise, the availability of high-quality tests in real-world scenarios is limited. In th…
▽ More
Large language models (LLMs) have shown significant advancements in code generation, but still face challenges on tasks beyond their basic capabilities. Recently, the notion of self-debugging has been proposed to boost the performance of code generation by leveraging execution feedback from tests. Despite its promise, the availability of high-quality tests in real-world scenarios is limited. In this context, self-debugging with self-generated tests is a promising solution but lacks a full exploration of its limitations and practical potential. Therefore, we investigate its efficacy on diverse programming problems. To deepen our understanding, we propose two distinct paradigms for the process: post-execution and in-execution self-debugging. Within the scope of self-contained Python programming tasks, we find that post-execution self-debugging struggles on basic problems but shows potential for improvement on competitive ones, due to the bias introduced by self-generated tests. On the other hand, in-execution self-debugging enables LLMs to mitigate the bias by solely leveraging intermediate states during execution, thereby enhancing code generation.
△ Less
Submitted 22 January, 2025;
originally announced January 2025.
-
FinSphere: A Conversational Stock Analysis Agent Equipped with Quantitative Tools based on Real-Time Database
Authors:
Shijie Han,
Changhai Zhou,
Yiqing Shen,
Tianning Sun,
Yuhua Zhou,
Xiaoxia Wang,
Zhixiao Yang,
Jingshu Zhang,
Hongguang Li
Abstract:
Current financial Large Language Models (LLMs) struggle with two critical limitations: a lack of depth in stock analysis, which impedes their ability to generate professional-grade insights, and the absence of objective evaluation metrics to assess the quality of stock analysis reports. To address these challenges, this paper introduces FinSphere, a conversational stock analysis agent, along with…
▽ More
Current financial Large Language Models (LLMs) struggle with two critical limitations: a lack of depth in stock analysis, which impedes their ability to generate professional-grade insights, and the absence of objective evaluation metrics to assess the quality of stock analysis reports. To address these challenges, this paper introduces FinSphere, a conversational stock analysis agent, along with three major contributions: (1) Stocksis, a dataset curated by industry experts to enhance LLMs' stock analysis capabilities, (2) AnalyScore, a systematic evaluation framework for assessing stock analysis quality, and (3) FinSphere, an AI agent that can generate high-quality stock analysis reports in response to user queries. Experiments demonstrate that FinSphere achieves superior performance compared to both general and domain-specific LLMs, as well as existing agent-based systems, even when they are enhanced with real-time data access and few-shot guidance. The integrated framework, which combines real-time data feeds, quantitative tools, and an instruction-tuned LLM, yields substantial improvements in both analytical quality and practical applicability for real-world stock analysis.
△ Less
Submitted 8 January, 2025;
originally announced January 2025.
-
Each Graph is a New Language: Graph Learning with LLMs
Authors:
Huachi Zhou,
Jiahe Du,
Chuang Zhou,
Chang Yang,
Yilin Xiao,
Yuxuan Xie,
Xiao Huang
Abstract:
Recent efforts leverage Large Language Models (LLMs) for modeling text-attributed graph structures in node classification tasks. These approaches describe graph structures for LLMs to understand or aggregate LLM-generated textual attribute embeddings through graph structure. However, these approaches face two main limitations in modeling graph structures with LLMs. (i) Graph descriptions become ve…
▽ More
Recent efforts leverage Large Language Models (LLMs) for modeling text-attributed graph structures in node classification tasks. These approaches describe graph structures for LLMs to understand or aggregate LLM-generated textual attribute embeddings through graph structure. However, these approaches face two main limitations in modeling graph structures with LLMs. (i) Graph descriptions become verbose in describing high-order graph structure. (ii) Textual attributes alone do not contain adequate graph structure information. It is challenging to model graph structure concisely and adequately with LLMs. LLMs lack built-in mechanisms to model graph structures directly. They also struggle with complex long-range dependencies between high-order nodes and target nodes.
Inspired by the observation that LLMs pre-trained on one language can achieve exceptional performance on another with minimal additional training, we propose \textbf{G}raph-\textbf{D}efined \textbf{L}anguage for \textbf{L}arge \textbf{L}anguage \textbf{M}odel (GDL4LLM). This novel framework enables LLMs to transfer their powerful language understanding capabilities to graph-structured data. GDL4LLM translates graphs into a graph language corpus instead of graph descriptions and pre-trains LLMs on this corpus to adequately understand graph structures. During fine-tuning, this corpus describes the structural information of target nodes concisely with only a few tokens. By treating graphs as a new language, GDL4LLM enables LLMs to model graph structures adequately and concisely for node classification tasks. Extensive experiments on three real-world datasets demonstrate that GDL4LLM outperforms description-based and textual attribute embeddings-based baselines by efficiently modeling different orders of graph structure with LLMs.
△ Less
Submitted 25 May, 2025; v1 submitted 20 January, 2025;
originally announced January 2025.
-
Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor Cores
Authors:
Haisha Zhao,
San Li,
Jiaheng Wang,
Chunbao Zhou,
Jue Wang,
Zhikuang Xin,
Shunde Li,
Zhiqiang Liang,
Zhijie Pan,
Fang Liu,
Yan Zeng,
Yangang Wang,
Xuebin Chi
Abstract:
General-purpose Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental kernel in scientific computing and deep learning. The emergence of new matrix computation units such as Tensor Cores (TCs) brings more opportunities for SpMM acceleration. However, in order to fully unleash the power of hardware performance, systematic optimization is required. In this paper, we propose Acc-SpMM, a high-pe…
▽ More
General-purpose Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental kernel in scientific computing and deep learning. The emergence of new matrix computation units such as Tensor Cores (TCs) brings more opportunities for SpMM acceleration. However, in order to fully unleash the power of hardware performance, systematic optimization is required. In this paper, we propose Acc-SpMM, a high-performance SpMM library on TCs, with multiple optimizations, including data-affinity-based reordering, memory efficient compressed format, high-throughput pipeline, and adaptive sparsity-aware load balancing. In contrast to the state-of-the-art SpMM kernels on various NVIDIA GPU architectures with a diverse range of benchmark matrices, Acc-SpMM achieves significant performance improvements, on average 2.52x (up to 5.11x) speedup on RTX 4090, on average 1.91x (up to 4.68x) speedup on A800, and on average 1.58x (up to 3.60x) speedup on H100 over cuSPARSE.
△ Less
Submitted 15 January, 2025;
originally announced January 2025.
-
A Partial Initialization Strategy to Mitigate the Overfitting Problem in CATE Estimation with Hidden Confounding
Authors:
Chuan Zhou,
Yaxuan Li,
Chunyuan Zheng,
Haiteng Zhang,
Haoxuan Li,
Mingming Gong
Abstract:
Estimating the conditional average treatment effect (CATE) from observational data plays a crucial role in areas such as e-commerce, healthcare, and economics. Existing studies mainly rely on the strong ignorability assumption that there are no hidden confounders, whose existence cannot be tested from observational data and can invalidate any causal conclusion. In contrast, data collected from ran…
▽ More
Estimating the conditional average treatment effect (CATE) from observational data plays a crucial role in areas such as e-commerce, healthcare, and economics. Existing studies mainly rely on the strong ignorability assumption that there are no hidden confounders, whose existence cannot be tested from observational data and can invalidate any causal conclusion. In contrast, data collected from randomized controlled trials (RCT) do not suffer from confounding but are usually limited by a small sample size. To avoid overfitting caused by the small-scale RCT data, we propose a novel two-stage pretraining-finetuning (TSPF) framework with a partial parameter initialization strategy to estimate the CATE in the presence of hidden confounding. In the first stage, a foundational representation of covariates is trained to estimate counterfactual outcomes through large-scale observational data. In the second stage, we propose to train an augmented representation of the covariates, which is concatenated with the foundational representation obtained in the first stage to adjust for the hidden confounding. Rather than training a separate network from scratch, part of the prediction heads are initialized from the first stage. The superiority of our approach is validated on two datasets with extensive experiments.
△ Less
Submitted 25 January, 2025; v1 submitted 15 January, 2025;
originally announced January 2025.