-
New insight into the hard X-ray emission influenced by the type-\uppercase\expandafter{\romannumeral1} bursts observed by Insight-HXMT during outburst of 4U 1636--536
Authors:
J. Q. Peng,
S. Zhang,
Y. P. Chen,
L. D. Kong,
P. J. Wang,
S. N. Zhang,
Q. C. Shui,
L. Ji,
G. B. Zhang,
Z. Yan,
L. Tao,
J. L. Qu,
M. Y. Ge,
Z. L. Yu,
J. Li,
Z. Chang,
Z. S. Li,
P. Zhang,
Y. X. Xiao,
S. J. Zhao
Abstract:
By analyzing the data from Insight-HXMT and NICER, we can determine the evolution of the significance of the hard shortage in 4U 1636--536 with its spectral state, as well as the evolution of the fraction of deficit with energy. Additionally, we investigate the possible geometry and evolution of the corona in 4U 1636-536 by combining our findings with the results of spectral analysis. We find that…
▽ More
By analyzing the data from Insight-HXMT and NICER, we can determine the evolution of the significance of the hard shortage in 4U 1636--536 with its spectral state, as well as the evolution of the fraction of deficit with energy. Additionally, we investigate the possible geometry and evolution of the corona in 4U 1636-536 by combining our findings with the results of spectral analysis. We find that during the soft state, the significance of possible hard X-ray shortage in bursts is almost zero. However, in the hard state, some bursts exhibit significant shortages (>3 $σ$), while others do not. We attempt to establish a correlation between the significance of the hard X-ray shortage and the spectral parameters, but the data quality and the limited number of bursts prevent us from finding a strong correlation. For bursts with insignificant shortages in the soft state, their fraction of the deficit remains small. However, in the hard state, the fraction of deficit for all bursts increases with energy, regardless of the significance of the shortage of individual bursts. For bursts during the hard state, we investigate the evolution of the fraction of deficit during the bursts by stacking the peaks and decays of the bursts, respectively, and find that as the flux of the bursts decreases, the energy corresponding to the maximum of the fraction of deficit becomes progressively higher. We explore the possible geometry and evolution of the corona clued by the evolution of the fraction of deficit, which is obtained from the spectral and temporal analysis.
△ Less
Submitted 4 March, 2025;
originally announced March 2025.
-
Segmenting Bi-Atrial Structures Using ResNext Based Framework
Authors:
Malitha Gunawardhana,
Fangqiang Xu,
Jichao Zhao
Abstract:
Atrial fibrillation (AF) is the most common cardiac arrhythmia, significantly contributing to mortality, particularly in older populations. While pulmonary vein isolation is a standard treatment, its effectiveness is limited in patients with persistent AF. Recent research highlights the importance of targeting additional atrial regions, particularly fibrotic areas identified via late gadolinium-en…
▽ More
Atrial fibrillation (AF) is the most common cardiac arrhythmia, significantly contributing to mortality, particularly in older populations. While pulmonary vein isolation is a standard treatment, its effectiveness is limited in patients with persistent AF. Recent research highlights the importance of targeting additional atrial regions, particularly fibrotic areas identified via late gadolinium-enhanced MRI (LGE-MRI). However, existing manual segmentation methods are time-consuming and prone to variability. Deep learning techniques, particularly convolutional neural networks (CNNs), have shown promise in automating segmentation. However, most studies focus solely on the left atrium (LA) and rely on small datasets, limiting generalizability. In this paper, we propose a novel two-stage framework incorporating ResNeXt encoders and a cyclic learning rate to segment both the right atrium (RA) and LA walls and cavities in LGE-MRIs. Our method aims to improve the segmentation of challenging small structures, such as atrial walls while maintaining high performance in larger regions like the atrial cavities. The results demonstrate that our approach offers superior segmentation accuracy and robustness compared to traditional architectures, particularly for imbalanced class structures.
△ Less
Submitted 26 March, 2025; v1 submitted 28 February, 2025;
originally announced March 2025.
-
Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance
Authors:
Jiayi Zhao,
Fei Teng,
Kai Luo,
Guoqiang Zhao,
Zhiyong Li,
Xu Zheng,
Kailun Yang
Abstract:
The perception capability of robotic systems relies on the richness of the dataset. Although Segment Anything Model 2 (SAM2), trained on large datasets, demonstrates strong perception potential in perception tasks, its inherent training paradigm prevents it from being suitable for RGB-T tasks. To address these challenges, we propose SHIFNet, a novel SAM2-driven Hybrid Interaction Paradigm that unl…
▽ More
The perception capability of robotic systems relies on the richness of the dataset. Although Segment Anything Model 2 (SAM2), trained on large datasets, demonstrates strong perception potential in perception tasks, its inherent training paradigm prevents it from being suitable for RGB-T tasks. To address these challenges, we propose SHIFNet, a novel SAM2-driven Hybrid Interaction Paradigm that unlocks the potential of SAM2 with linguistic guidance for efficient RGB-Thermal perception. Our framework consists of two key components: (1) Semantic-Aware Cross-modal Fusion (SACF) module that dynamically balances modality contributions through text-guided affinity learning, overcoming SAM2's inherent RGB bias; (2) Heterogeneous Prompting Decoder (HPD) that enhances global semantic information through a semantic enhancement module and then combined with category embeddings to amplify cross-modal semantic consistency. With 32.27M trainable parameters, SHIFNet achieves state-of-the-art segmentation performance on public benchmarks, reaching 89.8% on PST900 and 67.8% on FMB, respectively. The framework facilitates the adaptation of pre-trained large models to RGB-T segmentation tasks, effectively mitigating the high costs associated with data collection while endowing robotic systems with comprehensive perception capabilities. The source code will be made publicly available at https://github.com/iAsakiT3T/SHIFNet.
△ Less
Submitted 4 March, 2025;
originally announced March 2025.
-
Tight Gap-Dependent Memory-Regret Trade-Off for Single-Pass Streaming Stochastic Multi-Armed Bandits
Authors:
Zichun Ye,
Chihao Zhang,
Jiahao Zhao
Abstract:
We study the problem of minimizing gap-dependent regret for single-pass streaming stochastic multi-armed bandits (MAB). In this problem, the $n$ arms are present in a stream, and at most $m<n$ arms and their statistics can be stored in the memory. We establish tight non-asymptotic regret bounds regarding all relevant parameters, including the number of arms $n$, the memory size $m$, the number of…
▽ More
We study the problem of minimizing gap-dependent regret for single-pass streaming stochastic multi-armed bandits (MAB). In this problem, the $n$ arms are present in a stream, and at most $m<n$ arms and their statistics can be stored in the memory. We establish tight non-asymptotic regret bounds regarding all relevant parameters, including the number of arms $n$, the memory size $m$, the number of rounds $T$ and $(Δ_i)_{i\in [n]}$ where $Δ_i$ is the reward mean gap between the best arm and the $i$-th arm. These gaps are not known in advance by the player. Specifically, for any constant $α\ge 1$, we present two algorithms: one applicable for $m\ge \frac{2}{3}n$ with regret at most $O_α\Big(\frac{(n-m)T^{\frac{1}{α+ 1}}}{n^{1 + {\frac{1}{α+ 1}}}}\displaystyle\sum_{i:Δ_i > 0}Δ_i^{1 - 2α}\Big)$ and another applicable for $m<\frac{2}{3}n$ with regret at most $O_α\Big(\frac{T^{\frac{1}{α+1}}}{m^{\frac{1}{α+1}}}\displaystyle\sum_{i:Δ_i > 0}Δ_i^{1 - 2α}\Big)$. We also prove matching lower bounds for both cases by showing that for any constant $α\ge 1$ and any $m\leq k < n$, there exists a set of hard instances on which the regret of any algorithm is $Ω_α\Big(\frac{(k-m+1) T^{\frac{1}{α+1}}}{k^{1 + \frac{1}{α+1}}} \sum_{i:Δ_i > 0}Δ_i^{1-2α}\Big)$. This is the first tight gap-dependent regret bound for streaming MAB. Prior to our work, an $O\Big(\sum_{i\colonΔ>0} \frac{\sqrt{T}\log T}{Δ_i}\Big)$ upper bound for the special case of $α=1$ and $m=O(1)$ was established by Agarwal, Khanna and Patil (COLT'22). In contrast, our results provide the correct order of regret as $Θ\Big(\frac{1}{\sqrt{m}}\sum_{i\colonΔ>0}\frac{\sqrt{T}}{Δ_i}\Big)$.
△ Less
Submitted 4 March, 2025;
originally announced March 2025.
-
First Measurement of the Decay Dynamics in the Semileptonic Transition of the $D^{+(0)}$ into the Axial-vector Meson $\bar K_1(1270)$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (680 additional authors not shown)
Abstract:
Using $e^+e^-$ collision data taken at the center-of-mass energy of 3.773 GeV with the BESIII detector, corresponding to an integrated luminosity of 20.3 fb$^{-1}$, we report the first amplitude and angular analyses of the semileptonic decays $D^{+(0)}\to K^-π^+π^{0(-)} e^+ν_e$. From the amplitude analysis, we determine for the first time the hadronic form factors of the semileptonic $D$ decays in…
▽ More
Using $e^+e^-$ collision data taken at the center-of-mass energy of 3.773 GeV with the BESIII detector, corresponding to an integrated luminosity of 20.3 fb$^{-1}$, we report the first amplitude and angular analyses of the semileptonic decays $D^{+(0)}\to K^-π^+π^{0(-)} e^+ν_e$. From the amplitude analysis, we determine for the first time the hadronic form factors of the semileptonic $D$ decays into the axial-vector meson $\bar{K}_1(1270)$ to be $r_A=(-11.2\pm1.0\pm0.9)\times10^{-2}$ and $r_V = (-4.3\pm 1.0\pm2.4)\times 10^{-2}$. The angular analysis yields an up-down asymmetry $\mathcal{A}^\prime_{ud} = 0.01\pm0.11$, which is consistent with the Standard Model prediction.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
Streaming Piano Transcription Based on Consistent Onset and Offset Decoding with Sustain Pedal Detection
Authors:
Weixing Wei,
Jiahao Zhao,
Yulun Wu,
Kazuyoshi Yoshii
Abstract:
This paper describes a streaming audio-to-MIDI piano transcription approach that aims to sequentially translate a music signal into a sequence of note onset and offset events. The sequence-to-sequence nature of this task may call for the computationally-intensive transformer model for better performance, which has recently been used for offline transcription benchmarks and could be extended for st…
▽ More
This paper describes a streaming audio-to-MIDI piano transcription approach that aims to sequentially translate a music signal into a sequence of note onset and offset events. The sequence-to-sequence nature of this task may call for the computationally-intensive transformer model for better performance, which has recently been used for offline transcription benchmarks and could be extended for streaming transcription with causal attention mechanisms. We assume that the performance limitation of this naive approach lies in the decoder. Although time-frequency features useful for onset detection are considerably different from those for offset detection, the single decoder is trained to output a mixed sequence of onset and offset events without guarantee of the correspondence between the onset and offset events of the same note. To overcome this limitation, we propose a streaming encoder-decoder model that uses a convolutional encoder aggregating local acoustic features, followed by an autoregressive Transformer decoder detecting a variable number of onset events and another decoder detecting the offset events for the active pitches with validation of the sustain pedal at each time frame. Experiments using the MAESTRO dataset showed that the proposed streaming method performed comparably with or even better than the state-of-the-art offline methods while significantly reducing the computational cost.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
Coexistence of topological surface states and superconductivity in Dirac semimetal NiTe$_2$
Authors:
Chen He,
Jian-Zhou Zhao,
Mei Du,
Luo-Zhao Zhang,
Jia-Ying Zhang,
Kuo Yang,
Noah F. Q. Yuan,
Aleksandr Seliverstov,
Ewald Janssens,
Jun-Yi Ge,
Zhe Li
Abstract:
The coexistence of topological bands around the Fermi level ($E_F$) and superconductivity provides a fundamental platform for exploring their interplay. However, few materials inherently display both properties. In this study, we demonstrate the coexistence of topological surface states at the $E_F$ and superconductivity in NiTe$_2$ single crystals, a material hitherto not recognized as supercondu…
▽ More
The coexistence of topological bands around the Fermi level ($E_F$) and superconductivity provides a fundamental platform for exploring their interplay. However, few materials inherently display both properties. In this study, we demonstrate the coexistence of topological surface states at the $E_F$ and superconductivity in NiTe$_2$ single crystals, a material hitherto not recognized as superconducting. Quasiparticle interference measurements performed via scanning tunneling microscopy suggest the presence of topological surface states at the $E_F$, which is further corroborated by density functional theory simulations. Experimental evidence for superconductivity is provided via electronic transport measurements and specific heat capacity analyses. Our results suggest that NiTe$_2$ represents a promising platform for investigating the rich interplay between topological states and superconductivity.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
Towards Improved Text-Aligned Codebook Learning: Multi-Hierarchical Codebook-Text Alignment with Long Text
Authors:
Guotao Liang,
Baoquan Zhang,
Zhiyuan Wen,
Junteng Zhao,
Yunming Ye,
Kola Ye,
Yao He
Abstract:
Image quantization is a crucial technique in image generation, aimed at learning a codebook that encodes an image into a discrete token sequence. Recent advancements have seen researchers exploring learning multi-modal codebook (i.e., text-aligned codebook) by utilizing image caption semantics, aiming to enhance codebook performance in cross-modal tasks. However, existing image-text paired dataset…
▽ More
Image quantization is a crucial technique in image generation, aimed at learning a codebook that encodes an image into a discrete token sequence. Recent advancements have seen researchers exploring learning multi-modal codebook (i.e., text-aligned codebook) by utilizing image caption semantics, aiming to enhance codebook performance in cross-modal tasks. However, existing image-text paired datasets exhibit a notable flaw in that the text descriptions tend to be overly concise, failing to adequately describe the images and provide sufficient semantic knowledge, resulting in limited alignment of text and codebook at a fine-grained level. In this paper, we propose a novel Text-Augmented Codebook Learning framework, named TA-VQ, which generates longer text for each image using the visual-language model for improved text-aligned codebook learning. However, the long text presents two key challenges: how to encode text and how to align codebook and text. To tackle two challenges, we propose to split the long text into multiple granularities for encoding, i.e., word, phrase, and sentence, so that the long text can be fully encoded without losing any key semantic knowledge. Following this, a hierarchical encoder and novel sampling-based alignment strategy are designed to achieve fine-grained codebook-text alignment. Additionally, our method can be seamlessly integrated into existing VQ models. Extensive experiments in reconstruction and various downstream tasks demonstrate its effectiveness compared to previous state-of-the-art approaches.
△ Less
Submitted 11 March, 2025; v1 submitted 3 March, 2025;
originally announced March 2025.
-
Convex Hull-based Algebraic Constraint for Visual Quadric SLAM
Authors:
Xiaolong Yu,
Junqiao Zhao,
Shuangfu Song,
Zhongyang Zhu,
Zihan Yuan,
Chen Ye,
Tiantian Feng
Abstract:
Using Quadrics as the object representation has the benefits of both generality and closed-form projection derivation between image and world spaces. Although numerous constraints have been proposed for dual quadric reconstruction, we found that many of them are imprecise and provide minimal improvements to localization.After scrutinizing the existing constraints, we introduce a concise yet more p…
▽ More
Using Quadrics as the object representation has the benefits of both generality and closed-form projection derivation between image and world spaces. Although numerous constraints have been proposed for dual quadric reconstruction, we found that many of them are imprecise and provide minimal improvements to localization.After scrutinizing the existing constraints, we introduce a concise yet more precise convex hull-based algebraic constraint for object landmarks, which is applied to object reconstruction, frontend pose estimation, and backend bundle adjustment.This constraint is designed to fully leverage precise semantic segmentation, effectively mitigating mismatches between complex-shaped object contours and dual quadrics.Experiments on public datasets demonstrate that our approach is applicable to both monocular and RGB-D SLAM and achieves improved object mapping and localization than existing quadric SLAM methods. The implementation of our method is available at https://github.com/tiev-tongji/convexhull-based-algebraic-constraint.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
Simulation of the Background from $^{13}$C$(α, n)^{16}$O Reaction in the JUNO Scintillator
Authors:
JUNO Collaboration,
Thomas Adam,
Kai Adamowicz,
Shakeel Ahmad,
Rizwan Ahmed,
Sebastiano Aiello,
Fengpeng An,
Costas Andreopoulos,
Giuseppe Andronico,
Nikolay Anfimov,
Vito Antonelli,
Tatiana Antoshkina,
João Pedro Athayde Marcondes de André,
Didier Auguste,
Weidong Bai,
Nikita Balashov,
Andrea Barresi,
Davide Basilico,
Eric Baussan,
Marco Beretta,
Antonio Bergnoli,
Nikita Bessonov,
Daniel Bick,
Lukas Bieger,
Svetlana Biktemerova
, et al. (608 additional authors not shown)
Abstract:
Large-scale organic liquid scintillator detectors are highly efficient in the detection of MeV-scale electron antineutrinos. These signal events can be detected through inverse beta decay on protons, which produce a positron accompanied by a neutron. A noteworthy background for antineutrinos coming from nuclear power reactors and from the depths of the Earth (geoneutrinos) is generated by ($α, n$)…
▽ More
Large-scale organic liquid scintillator detectors are highly efficient in the detection of MeV-scale electron antineutrinos. These signal events can be detected through inverse beta decay on protons, which produce a positron accompanied by a neutron. A noteworthy background for antineutrinos coming from nuclear power reactors and from the depths of the Earth (geoneutrinos) is generated by ($α, n$) reactions. In organic liquid scintillator detectors, $α$ particles emitted from intrinsic contaminants such as $^{238}$U, $^{232}$Th, and $^{210}$Pb/$^{210}$Po, can be captured on $^{13}$C nuclei, followed by the emission of a MeV-scale neutron. Three distinct interaction mechanisms can produce prompt energy depositions preceding the delayed neutron capture, leading to a pair of events correlated in space and time within the detector. Thus, ($α, n$) reactions represent an indistinguishable background in liquid scintillator-based antineutrino detectors, where their expected rate and energy spectrum are typically evaluated via Monte Carlo simulations. This work presents results from the open-source SaG4n software, used to calculate the expected energy depositions from the neutron and any associated de-excitation products. Also simulated is a detailed detector response to these interactions, using a dedicated Geant4-based simulation software from the JUNO experiment. An expected measurable $^{13}$C$(α, n)^{16}$O event rate and reconstructed prompt energy spectrum with associated uncertainties, are presented in the context of JUNO, however, the methods and results are applicable and relevant to other organic liquid scintillator neutrino detectors.
△ Less
Submitted 2 May, 2025; v1 submitted 2 March, 2025;
originally announced March 2025.
-
Evaluating Personalized Tool-Augmented LLMs from the Perspectives of Personalization and Proactivity
Authors:
Yupu Hao,
Pengfei Cao,
Zhuoran Jin,
Huanxuan Liao,
Yubo Chen,
Kang Liu,
Jun Zhao
Abstract:
Personalized tool utilization is essential for aligning large language models (LLMs) with user preference in interaction scenarios with various tools. However, most of the current benchmarks primarily focus on either personalization of text generation or direct tool-utilizing, without considering both. In this work, we introduce a novel benchmark ETAPP for evaluating personalized tool invocation,…
▽ More
Personalized tool utilization is essential for aligning large language models (LLMs) with user preference in interaction scenarios with various tools. However, most of the current benchmarks primarily focus on either personalization of text generation or direct tool-utilizing, without considering both. In this work, we introduce a novel benchmark ETAPP for evaluating personalized tool invocation, establishing a sandbox environment, and a comprehensive dataset of 800 testing cases covering diverse user profiles. To improve the accuracy of our evaluation, we propose a key-point-based LLM evaluation method, mitigating biases in the LLM-as-a-judge system by manually annotating key points for each test case and providing them to LLM as the reference. Additionally, we evaluate the excellent LLMs and provide an in-depth analysis. Furthermore, we investigate the impact of different tool-invoking strategies on LLMs' personalization performance and the effects of fine-tuning in our task. The effectiveness of our preference-setting and key-point-based evaluation method is also validated. Our findings offer insights into improving personalized LLM agents. Our Code is available at https://github.com/hypasd-art/ETAPP.
△ Less
Submitted 12 April, 2025; v1 submitted 2 March, 2025;
originally announced March 2025.
-
LoR2C : Low-Rank Residual Connection Adaptation for Parameter-Efficient Fine-Tuning
Authors:
Jiancheng Zhao,
Xingda Yu,
Yuxiang Zhang,
Zhen Yang
Abstract:
In recent years, pretrained large language models have demonstrated outstanding performance across various natural language processing tasks. However, full-parameter fine-tuning methods require adjusting all model parameters, leading to immense computational resource demands. Although parameter-efficient fine-tuning methods like LoRA have significantly reduced the number of parameters, they still…
▽ More
In recent years, pretrained large language models have demonstrated outstanding performance across various natural language processing tasks. However, full-parameter fine-tuning methods require adjusting all model parameters, leading to immense computational resource demands. Although parameter-efficient fine-tuning methods like LoRA have significantly reduced the number of parameters, they still face challenges such as gradient vanishing and the potential for further parameter reduction. To address these issues, this paper proposes a novel parameter-efficient fine-tuning method called LoR2C (Low-Rank Residual Connection Adaptation). LoR2C introduces residual connections with low-rank matrices within the model layers, which not only reduces the number of fine-tuning parameters but also effectively alleviates the gradient vanishing problem. Additionally, this paper presents three optimization variants of LoR2C: ShareLoR2C, MergeLoR2C, and InjectLoR2C. These variants further improve parameter efficiency and model performance through parameter sharing, module merging, and injection mechanisms, respectively. Experimental results on multiple natural language understanding and natural language generation tasks demonstrate that LoR2C and its optimized variants significantly reduce parameter overhead while maintaining or even improving performance, outperforming existing mainstream parameter-efficient fine-tuning methods.Our code is publicly available at https://github.com/Oblivioniss/LoR2C.
△ Less
Submitted 1 March, 2025;
originally announced March 2025.
-
The effect of remote work on urban transportation emissions: evidence from 141 cities
Authors:
Sophia Shen,
Xinyi Wang,
Nicholas Caros,
Jinhua Zhao
Abstract:
The overall impact of working from home (WFH) on transportation emissions remains a complex issue, with significant implications for policymaking. This study matches socioeconomic information from American Community Survey (ACS) to the global carbon emissions dataset for selected Metropolitan Statistical Areas (MSAs) in the US. We analyze the impact of WFH on transportation emissions before and du…
▽ More
The overall impact of working from home (WFH) on transportation emissions remains a complex issue, with significant implications for policymaking. This study matches socioeconomic information from American Community Survey (ACS) to the global carbon emissions dataset for selected Metropolitan Statistical Areas (MSAs) in the US. We analyze the impact of WFH on transportation emissions before and during the COVID-19 pandemic. Employing cross-sectional multiple regression models and Blinder-Oaxaca decomposition, we examine how WFH, commuting mode, and car ownership influence transportation emissions across 141 MSAs in the United States. We find that the prevalence of WFH in 2021 is associated with lower transportation emissions, whereas WFH in 2019 did not significantly impact transportation emissions. After controlling for public transportation usage and car ownership, we find that a 1% increase in WFH corresponds to a 0.17 kilogram or 1.8% reduction of daily average transportation emissions per capita. The Blinder-Oaxaca decomposition shows that WFH is the main driver in reducing transportation emissions per capita during the pandemic. Our results show that the reductive influence of public transportation on transportation emissions has declined, while the impact of car ownership on increasing transportation emissions has risen. Collectively, these results indicate a multifaceted impact of WFH on transportation emissions. This study underscores the need for a nuanced, data-driven approach in crafting WFH policies to mitigate transportation emissions effectively.
△ Less
Submitted 1 March, 2025;
originally announced March 2025.
-
Brickify: Enabling Expressive Design Intent Specification through Direct Manipulation on Design Tokens
Authors:
Xinyu Shi,
Yinghou Wang,
Ryan Rossi,
Jian Zhao
Abstract:
Expressing design intent using natural language prompts requires designers to verbalize the ambiguous visual details concisely, which can be challenging or even impossible. To address this, we introduce Brickify, a visual-centric interaction paradigm -- expressing design intent through direct manipulation on design tokens. Brickify extracts visual elements (e.g., subject, style, and color) from re…
▽ More
Expressing design intent using natural language prompts requires designers to verbalize the ambiguous visual details concisely, which can be challenging or even impossible. To address this, we introduce Brickify, a visual-centric interaction paradigm -- expressing design intent through direct manipulation on design tokens. Brickify extracts visual elements (e.g., subject, style, and color) from reference images and converts them into interactive and reusable design tokens that can be directly manipulated (e.g., resize, group, link, etc.) to form the visual lexicon. The lexicon reflects users' intent for both what visual elements are desired and how to construct them into a whole. We developed Brickify to demonstrate how AI models can interpret and execute the visual lexicon through an end-to-end pipeline. In a user study, experienced designers found Brickify more efficient and intuitive than text-based prompts, allowing them to describe visual details, explore alternatives, and refine complex designs with greater ease and control.
△ Less
Submitted 28 February, 2025;
originally announced February 2025.
-
Capability Localization: Capabilities Can be Localized rather than Individual Knowledge
Authors:
Xiusheng Huang,
Jiaxiang Liu,
Yequan Wang,
Jun Zhao,
Kang Liu
Abstract:
Large scale language models have achieved superior performance in tasks related to natural language processing, however, it is still unclear how model parameters affect performance improvement. Previous studies assumed that individual knowledge is stored in local parameters, and the storage form of individual knowledge is dispersed parameters, parameter layers, or parameter chains, which are not u…
▽ More
Large scale language models have achieved superior performance in tasks related to natural language processing, however, it is still unclear how model parameters affect performance improvement. Previous studies assumed that individual knowledge is stored in local parameters, and the storage form of individual knowledge is dispersed parameters, parameter layers, or parameter chains, which are not unified. We found through fidelity and reliability evaluation experiments that individual knowledge cannot be localized. Afterwards, we constructed a dataset for decoupling experiments and discovered the potential for localizing data commonalities. To further reveal this phenomenon, this paper proposes a Commonality Neuron Localization (CNL) method, which successfully locates commonality neurons and achieves a neuron overlap rate of 96.42% on the GSM8K dataset. Finally, we have demonstrated through cross data experiments that commonality neurons are a collection of capability neurons that possess the capability to enhance performance. Our code is available at https://github.com/nlpkeg/Capability-Neuron-Localization.
△ Less
Submitted 28 February, 2025;
originally announced February 2025.
-
DiffBrush:Just Painting the Art by Your Hands
Authors:
Jiaming Chu,
Lei Jin,
Tao Wang,
Junliang Xing,
Jian Zhao
Abstract:
The rapid development of image generation and editing algorithms in recent years has enabled ordinary user to produce realistic images. However, the current AI painting ecosystem predominantly relies on text-driven diffusion models (T2I), which pose challenges in accurately capturing user requirements. Furthermore, achieving compatibility with other modalities incurs substantial training costs. To…
▽ More
The rapid development of image generation and editing algorithms in recent years has enabled ordinary user to produce realistic images. However, the current AI painting ecosystem predominantly relies on text-driven diffusion models (T2I), which pose challenges in accurately capturing user requirements. Furthermore, achieving compatibility with other modalities incurs substantial training costs. To this end, we introduce DiffBrush, which is compatible with T2I models and allows users to draw and edit images. By manipulating and adapting the internal representation of the diffusion model, DiffBrush guides the model-generated images to converge towards the user's hand-drawn sketches for user's specific needs without additional training. DiffBrush achieves control over the color, semantic, and instance of objects in images by continuously guiding the latent and instance-level attention map during the denoising process of the diffusion model. Besides, we propose a latent regeneration, which refines the randomly sampled noise in the diffusion model, obtaining a better image generation layout. Finally, users only need to roughly draw the mask of the instance (acceptable colors) on the canvas, DiffBrush can naturally generate the corresponding instance at the corresponding location.
△ Less
Submitted 28 February, 2025;
originally announced February 2025.
-
Improved measurement of absolute branching fraction of the inclusive decay $Λ_{c}^{+} \to K_{S}^{0} X$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (679 additional authors not shown)
Abstract:
By analyzing $4.5$ fb$^{-1}$ of $e^{+}e^{-}$ collision data accumulated with the BESIII detector at center-of-mass energies ranging from $4599.53$ MeV to $4698.82$ MeV, we report the measurement of the absolute branching fraction (BF) of the inclusive decay $Λ_{c}^{+} \to K_{S}^{0} X$ using the double-tag technique. The result is $\mathcal{B}(Λ_{c}^{+} \to K_{S}^{0} X)=(10.9\pm0.2\pm0.1)\%$, where…
▽ More
By analyzing $4.5$ fb$^{-1}$ of $e^{+}e^{-}$ collision data accumulated with the BESIII detector at center-of-mass energies ranging from $4599.53$ MeV to $4698.82$ MeV, we report the measurement of the absolute branching fraction (BF) of the inclusive decay $Λ_{c}^{+} \to K_{S}^{0} X$ using the double-tag technique. The result is $\mathcal{B}(Λ_{c}^{+} \to K_{S}^{0} X)=(10.9\pm0.2\pm0.1)\%$, where the first uncertainty is statistical and the second is systematic. This result indicates that there are still undiscovered decay channels containing $K_{S}^{0}$ in the final state with a combined BF of $(3.1\pm0.4)\%$. The BF of the inclusive decay $Λ_{c}^{+} \to \overline{K}^{0} / K^{0} X$ is calculated to be $\mathcal{B}(Λ_{c}^{+} \to \overline{K}^{0} / K^{0} X)=(21.8 \pm0.4 \pm0.2 \pm1.1)\%$, where the third uncertainty accounts for a possible difference between $\mathcal{B}(Λ_{c}^{+} \to K_{S}^{0} X)$ and $\mathcal{B}(Λ_{c}^{+} \to K_{L}^{0} X)$. The result is in agreement with the prediction of the statistical isospin model.
△ Less
Submitted 28 February, 2025;
originally announced February 2025.
-
Unraveling the origin of Kondo-like behavior in the 3$d$-electron heavy-fermion compound YFe$_{2}$Ge$_{2}$
Authors:
Bing Xu,
Rui Liu,
Hongliang Wo,
Zhiyu Liao,
Shaohui Yi,
Chunhong Li,
Jun Zhao,
Xianggang Qiu,
Zhiping Yin,
Christian Bernhard
Abstract:
The heavy fermion (HF) state of $d$-electron systems is of great current interest since it exhibits various exotic phases and phenomena that are reminiscent of the Kondo effect in $f$-electron HF systems. Here, we present a combined infrared spectroscopy and first-principles band structure calculation study of the $3d$-electron HF compound YFe$_2$Ge$_2$. The infrared response exhibits several char…
▽ More
The heavy fermion (HF) state of $d$-electron systems is of great current interest since it exhibits various exotic phases and phenomena that are reminiscent of the Kondo effect in $f$-electron HF systems. Here, we present a combined infrared spectroscopy and first-principles band structure calculation study of the $3d$-electron HF compound YFe$_2$Ge$_2$. The infrared response exhibits several charge-dynamical hallmarks of HF and a corresponding scaling behavior that resemble those of the $f$-electron HF systems. In particular, the low-temperature spectra reveal a dramatic narrowing of the Drude response along with the appearance of a hybridization gap ($Δ\sim$ 50 meV) and a strongly enhanced quasiparticle effective mass. Moreover, the temperature dependence of the infrared response indicates a crossover around $T^{\ast} \sim$ 100 K from a coherent state at low temperature to a quasi-incoherent one at high temperature. Despite of these striking similarities, our band structure calculations suggest that the mechanism underlying the HF behavior in YFe$_2$Ge$_2$ is distinct from the Kondo scenario of the $f$-electron HF compounds and even from that of the $d$-electron iron-arsenide superconductor KFe$_2$As$_2$. For the latter, the HF state is driven by orbital-selective correlations due to a strong Hund's coupling. Instead, for YFe$_2$Ge$_2$ the HF behavior originates from the band flatness near the Fermi level induced by the combined effects of kinetic frustration from a destructive interference between the direct Fe-Fe and indirect Fe-Ge-Fe hoppings, band hybridization involving Fe $3d$ and Y $4d$ electrons, and electron correlations. This highlights that rather different mechanisms can be at the heart of the HF state in $d$-electron systems.
△ Less
Submitted 28 February, 2025;
originally announced February 2025.
-
Precision measurement of the branching fraction for the decay $ψ(2S)\rightarrowτ^{+}τ^{-}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (691 additional authors not shown)
Abstract:
Using $(2259.3 \pm 11.1)\times10^{6}$ $ψ(2S)$ events acquired with the BESIII detector, the branching fraction of $ψ(2S)\rightarrowτ^{+}τ^{-}$ is measured with improved precision to be $\mathcal{B}_{ψ(2S)\rightarrowτ^{+}τ^{-}}=(3.240~\pm~0.023~\pm~0.081)\times 10^{-3}$, where the first and second uncertainties are statistical and systematic, respectively, which is consistent with the world average…
▽ More
Using $(2259.3 \pm 11.1)\times10^{6}$ $ψ(2S)$ events acquired with the BESIII detector, the branching fraction of $ψ(2S)\rightarrowτ^{+}τ^{-}$ is measured with improved precision to be $\mathcal{B}_{ψ(2S)\rightarrowτ^{+}τ^{-}}=(3.240~\pm~0.023~\pm~0.081)\times 10^{-3}$, where the first and second uncertainties are statistical and systematic, respectively, which is consistent with the world average value within one standard deviation. This value, along with those for the branching fractions of the $ψ(2S)$ decaying into $e^{+}e^{-}$ and $μ^{+}μ^{-}$, is in good agreement with the relation predicted by the sequential lepton hypothesis. Combining the branching fraction values with the leptonic width of the $ψ(2S)$, the total width of the $ψ(2S)$ is determined to be (287 $\pm$ 9) keV.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
Will the Technological Singularity Come Soon? Modeling the Dynamics of Artificial Intelligence Development via Multi-Logistic Growth Process
Authors:
Guangyin Jin,
Xiaohan Ni,
Kun Wei,
Jie Zhao,
Haoming Zhang,
Leiming Jia
Abstract:
We are currently in an era of escalating technological complexity and profound societal transformations, where artificial intelligence (AI) technologies exemplified by large language models (LLMs) have reignited discussions on the 'Technological Singularity'. 'Technological Singularity' is a philosophical concept referring to an irreversible and profound transformation that occurs when AI capabili…
▽ More
We are currently in an era of escalating technological complexity and profound societal transformations, where artificial intelligence (AI) technologies exemplified by large language models (LLMs) have reignited discussions on the 'Technological Singularity'. 'Technological Singularity' is a philosophical concept referring to an irreversible and profound transformation that occurs when AI capabilities surpass those of humans comprehensively. However, quantitative modeling and analysis of the historical evolution and future trends of AI technologies remain scarce, failing to substantiate the singularity hypothesis adequately. This paper hypothesizes that the development of AI technologies could be characterized by the superposition of multiple logistic growth processes. To explore this hypothesis, we propose a multi-logistic growth process model and validate it using two real-world datasets: AI Historical Statistics and Arxiv AI Papers. Our analysis of the AI Historical Statistics dataset assesses the effectiveness of the multi-logistic model and evaluates the current and future trends in AI technology development. Additionally, cross-validation experiments on the Arxiv AI Paper, GPU Transistor and Internet User dataset enhance the robustness of our conclusions derived from the AI Historical Statistics dataset. The experimental results reveal that around 2024 marks the fastest point of the current AI wave, and the deep learning-based AI technologies are projected to decline around 2035-2040 if no fundamental technological innovation emerges. Consequently, the technological singularity appears unlikely to arrive in the foreseeable future.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
DataMan: Data Manager for Pre-training Large Language Models
Authors:
Ru Peng,
Kexin Yang,
Yawen Zeng,
Junyang Lin,
Dayiheng Liu,
Junbo Zhao
Abstract:
The performance emergence of large language models (LLMs) driven by data scaling laws makes the selection of pre-training data increasingly important. However, existing methods rely on limited heuristics and human intuition, lacking comprehensive and clear guidelines. To address this, we are inspired by ``reverse thinking'' -- prompting LLMs to self-identify which criteria benefit its performance.…
▽ More
The performance emergence of large language models (LLMs) driven by data scaling laws makes the selection of pre-training data increasingly important. However, existing methods rely on limited heuristics and human intuition, lacking comprehensive and clear guidelines. To address this, we are inspired by ``reverse thinking'' -- prompting LLMs to self-identify which criteria benefit its performance. As its pre-training capabilities are related to perplexity (PPL), we derive 14 quality criteria from the causes of text perplexity anomalies and introduce 15 common application domains to support domain mixing. In this paper, we train a Data Manager (DataMan) to learn quality ratings and domain recognition from pointwise rating, and use it to annotate a 447B token pre-training corpus with 14 quality ratings and domain type. Our experiments validate our approach, using DataMan to select 30B tokens to train a 1.3B-parameter language model, demonstrating significant improvements in in-context learning (ICL), perplexity, and instruction-following ability over the state-of-the-art baseline. The best-performing model, based on the Overall Score l=5 surpasses a model trained with 50% more data using uniform sampling. We continue pre-training with high-rated, domain-specific data annotated by DataMan to enhance domain-specific ICL performance and thus verify DataMan's domain mixing ability. Our findings emphasize the importance of quality ranking, the complementary nature of quality criteria, and their low correlation with perplexity, analyzing misalignment between PPL and ICL performance. We also thoroughly analyzed our pre-training dataset, examining its composition, the distribution of quality ratings, and the original document sources.
△ Less
Submitted 7 April, 2025; v1 submitted 26 February, 2025;
originally announced February 2025.
-
U(1) Dirac quantum spin liquid candidate in triangular-lattice antiferromagnet CeMgAl$_{11}$O$_{19}$
Authors:
Yantao Cao,
Akihiro Koda,
M. D. Le,
V. Pomjakushin,
Benqiong Liu,
Zhendong Fu,
Zhiwei Li,
Jinkui Zhao,
Zhaoming Tian,
Hanjie Guo
Abstract:
Quantum spin liquid represents an intriguing state where electron spins are highly entangled yet spin fluctuation persists even at 0 K. Recently, the hexaaluminates \textit{R}MgAl$_{11}$O$_{19}$ (\textit{R} = rare earth) have been proposed to be a platform for realizing the quantum spin liquid state with dominant Ising anisotropic correlations. Here, we report detailed low-temperature magnetic sus…
▽ More
Quantum spin liquid represents an intriguing state where electron spins are highly entangled yet spin fluctuation persists even at 0 K. Recently, the hexaaluminates \textit{R}MgAl$_{11}$O$_{19}$ (\textit{R} = rare earth) have been proposed to be a platform for realizing the quantum spin liquid state with dominant Ising anisotropic correlations. Here, we report detailed low-temperature magnetic susceptibility, muon spin relaxation, and thermodynamic studies on the CeMgAl$_{11}$O$_{19}$ single crystal. Ising anisotropy is revealed by magnetic susceptibility measurements. Muon spin relaxation and ac susceptibility measurements rule out any long-range magnetic ordering or spin freezing down to 50 mK despite the onset of spin correlations below $\sim$0.8 K. Instead, the spins keep fluctuating at a rate of 1.0(2) MHz at 50 mK. Specific heat results indicate a gapless excitation with a power-law dependence on temperature, $C_m(T) \propto T^α$. The quasi-quadratic temperature dependence with $α$ = 2.28(4) in zero field and linear temperature dependence in 0.25 T support the possible realization of the U(1) Dirac quantum spin liquid state.
△ Less
Submitted 26 February, 2025;
originally announced February 2025.
-
Evaluating Intelligence via Trial and Error
Authors:
Jingtao Zhan,
Jiahao Zhao,
Jiayu Li,
Yiqun Liu,
Bo Zhang,
Qingyao Ai,
Jiaxin Mao,
Hongning Wang,
Min Zhang,
Shaoping Ma
Abstract:
Intelligence is a crucial trait for species to find solutions within a limited number of trial-and-error attempts. Building on this idea, we introduce Survival Game as a framework to evaluate intelligence based on the number of failed attempts in a trial-and-error process. Fewer failures indicate higher intelligence. When the expectation and variance of failure counts are both finite, it signals t…
▽ More
Intelligence is a crucial trait for species to find solutions within a limited number of trial-and-error attempts. Building on this idea, we introduce Survival Game as a framework to evaluate intelligence based on the number of failed attempts in a trial-and-error process. Fewer failures indicate higher intelligence. When the expectation and variance of failure counts are both finite, it signals the ability to consistently find solutions to new challenges, which we define as the Autonomous Level of intelligence. Using Survival Game, we comprehensively evaluate existing AI systems. Our results show that while AI systems achieve the Autonomous Level in simple tasks, they are still far from it in more complex tasks, such as vision, search, recommendation, and language. While scaling current AI technologies might help, this would come at an astronomical cost. Projections suggest that achieving the Autonomous Level for general tasks would require $10^{26}$ parameters. To put this into perspective, loading such a massive model requires so many H100 GPUs that their total value is $10^{7}$ times that of Apple Inc.'s market value. Even with Moore's Law, supporting such a parameter scale would take $70$ years. This staggering cost highlights the complexity of human tasks and the inadequacies of current AI technologies. To further investigate this phenomenon, we conduct a theoretical analysis of Survival Game and its experimental results. Our findings suggest that human tasks possess a criticality property. As a result, Autonomous Level requires a deep understanding of the task's underlying mechanisms. Current AI systems, however, do not fully grasp these mechanisms and instead rely on superficial mimicry, making it difficult for them to reach an autonomous level. We believe Survival Game can not only guide the future development of AI but also offer profound insights into human intelligence.
△ Less
Submitted 3 March, 2025; v1 submitted 26 February, 2025;
originally announced February 2025.
-
Revisiting Convolution Architecture in the Realm of DNA Foundation Models
Authors:
Yu Bo,
Weian Mao,
Yanjun Shao,
Weiqiang Bai,
Peng Ye,
Xinzhu Ma,
Junbo Zhao,
Hao Chen,
Chunhua Shen
Abstract:
In recent years, a variety of methods based on Transformer and state space model (SSM) architectures have been proposed, advancing foundational DNA language models. However, there is a lack of comparison between these recent approaches and the classical architecture convolutional networks (CNNs) on foundation model benchmarks. This raises the question: are CNNs truly being surpassed by these recen…
▽ More
In recent years, a variety of methods based on Transformer and state space model (SSM) architectures have been proposed, advancing foundational DNA language models. However, there is a lack of comparison between these recent approaches and the classical architecture convolutional networks (CNNs) on foundation model benchmarks. This raises the question: are CNNs truly being surpassed by these recent approaches based on transformer and SSM architectures? In this paper, we develop a simple but well-designed CNN-based method termed ConvNova. ConvNova identifies and proposes three effective designs: 1) dilated convolutions, 2) gated convolutions, and 3) a dual-branch framework for gating mechanisms. Through extensive empirical experiments, we demonstrate that ConvNova significantly outperforms recent methods on more than half of the tasks across several foundation model benchmarks. For example, in histone-related tasks, ConvNova exceeds the second-best method by an average of 5.8%, while generally utilizing fewer parameters and enabling faster computation. In addition, the experiments observed findings that may be related to biological characteristics. This indicates that CNNs are still a strong competitor compared to Transformers and SSMs. We anticipate that this work will spark renewed interest in CNN-based methods for DNA foundation models.
△ Less
Submitted 25 February, 2025;
originally announced February 2025.
-
DeepCircuitX: A Comprehensive Repository-Level Dataset for RTL Code Understanding, Generation, and PPA Analysis
Authors:
Zeju Li,
Changran Xu,
Zhengyuan Shi,
Zedong Peng,
Yi Liu,
Yunhao Zhou,
Lingfeng Zhou,
Chengyu Ma,
Jianyuan Zhong,
Xi Wang,
Jieru Zhao,
Zhufei Chu,
Xiaoyan Yang,
Qiang Xu
Abstract:
This paper introduces DeepCircuitX, a comprehensive repository-level dataset designed to advance RTL (Register Transfer Level) code understanding, generation, and power-performance-area (PPA) analysis. Unlike existing datasets that are limited to either file-level RTL code or physical layout data, DeepCircuitX provides a holistic, multilevel resource that spans repository, file, module, and block-…
▽ More
This paper introduces DeepCircuitX, a comprehensive repository-level dataset designed to advance RTL (Register Transfer Level) code understanding, generation, and power-performance-area (PPA) analysis. Unlike existing datasets that are limited to either file-level RTL code or physical layout data, DeepCircuitX provides a holistic, multilevel resource that spans repository, file, module, and block-level RTL code. This structure enables more nuanced training and evaluation of large language models (LLMs) for RTL-specific tasks. DeepCircuitX is enriched with Chain of Thought (CoT) annotations, offering detailed descriptions of functionality and structure at multiple levels. These annotations enhance its utility for a wide range of tasks, including RTL code understanding, generation, and completion. Additionally, the dataset includes synthesized netlists and PPA metrics, facilitating early-stage design exploration and enabling accurate PPA prediction directly from RTL code. We demonstrate the dataset's effectiveness on various LLMs finetuned with our dataset and confirm the quality with human evaluations. Our results highlight DeepCircuitX as a critical resource for advancing RTL-focused machine learning applications in hardware design automation.Our data is available at https://zeju.gitbook.io/lcm-team.
△ Less
Submitted 25 February, 2025;
originally announced February 2025.
-
How Vital is the Jurisprudential Relevance: Law Article Intervened Legal Case Retrieval and Matching
Authors:
Nuo Xu,
Pinghui Wang,
Zi Liang,
Junzhou Zhao,
Xiaohong Guan
Abstract:
Legal case retrieval (LCR) aims to automatically scour for comparable legal cases based on a given query, which is crucial for offering relevant precedents to support the judgment in intelligent legal systems. Due to similar goals, it is often associated with a similar case matching (LCM) task. To address them, a daunting challenge is assessing the uniquely defined legal-rational similarity within…
▽ More
Legal case retrieval (LCR) aims to automatically scour for comparable legal cases based on a given query, which is crucial for offering relevant precedents to support the judgment in intelligent legal systems. Due to similar goals, it is often associated with a similar case matching (LCM) task. To address them, a daunting challenge is assessing the uniquely defined legal-rational similarity within the judicial domain, which distinctly deviates from the semantic similarities in general text retrieval. Past works either tagged domain-specific factors or incorporated reference laws to capture legal-rational information. However, their heavy reliance on expert or unrealistic assumptions restricts their practical applicability in real-world scenarios. In this paper, we propose an end-to-end model named LCM-LAI to solve the above challenges. Through meticulous theoretical analysis, LCM-LAI employs a dependent multi-task learning framework to capture legal-rational information within legal cases by a law article prediction (LAP) sub-task, without any additional assumptions in inference. Besides, LCM-LAI proposes an article-aware attention mechanism to evaluate the legal-rational similarity between across-case sentences based on law distribution, which is more effective than conventional semantic similarity. Weperform a series of exhaustive experiments including two different tasks involving four real-world datasets. Results demonstrate that LCM-LAI achieves state-of-the-art performance.
△ Less
Submitted 25 February, 2025;
originally announced February 2025.
-
Observed Dispersive Properties of the Slow Magnetoacoustic Waves Propagating in Coronal Fan Loops above Sunspots
Authors:
Junwei Zhao,
Tongjiang Wang,
Ruizhu Chen
Abstract:
Recurrent and propagating intensity perturbations are frequently observed in extreme ultraviolet (EUV) channels along coronal fan loops above sunspots, and these perturbations are suggested to be slow magnetoacoustic waves. Numerous studies have been conducted to investigate their propagation speeds, damping, and excitation sources; however, there have been limited observational analyses on whethe…
▽ More
Recurrent and propagating intensity perturbations are frequently observed in extreme ultraviolet (EUV) channels along coronal fan loops above sunspots, and these perturbations are suggested to be slow magnetoacoustic waves. Numerous studies have been conducted to investigate their propagation speeds, damping, and excitation sources; however, there have been limited observational analyses on whether these waves are dispersive despite some theoretical studies. In this study, we apply cross-correlation analysis in the Fourier domain on slow magnetoacoustic waves using three different datasets: EUV intensity observed by SDO/AIA, differential emission measure (DEM) temperature maps, and Doppler velocities from Hinode/EIS spectrometer observations. The apparent phase velocities of the waves, which are the plane-of-sky component of the waves' phase velocities, are derived as functions of frequency for all the three datasets. It is found that the phase velocities show clear frequency dependency, with a general trend of increase with frequency, ranging from approximately 30 km/s around 3 mHz to about 80 km/s around 10 mHz. The frequency dependency of the phase velocities demonstrates that the slow magnetoacoustic waves in the coronal loops are dispersive. The dispersiveness of these waves can provide a useful tool for the diagnosis of physical conditions inside the coronal loops along which these waves travel.
△ Less
Submitted 25 February, 2025;
originally announced February 2025.
-
Waveguide Division Multiple Access for Pinching-Antenna Systems (PASS)
Authors:
Jingjing Zhao,
Xidong Mu,
Kaiquan Cai,
Yanbo Zhu,
Yuanwei Liu
Abstract:
A novel concept of waveguide division multiple access (WDMA) is proposed for multi-user pinching-antenna systems (PASS). The key principle of WDMA is to allocate each user with a dedicated waveguide, which is regarded as a new type of radio resources, so as to facilitate multi-user communications. By adjusting the activation positions of pinching antennas (PAs) over each waveguide, the pinching be…
▽ More
A novel concept of waveguide division multiple access (WDMA) is proposed for multi-user pinching-antenna systems (PASS). The key principle of WDMA is to allocate each user with a dedicated waveguide, which is regarded as a new type of radio resources, so as to facilitate multi-user communications. By adjusting the activation positions of pinching antennas (PAs) over each waveguide, the pinching beamforming can be exploited for intended user signal enhancement and inter-user interference mitigation. Considering both ideal continuous and practical discrete PA position activation schemes, a joint power allocation and pinching beamforming optimization problem is formulated for the maximization of the sum rate. An alternating optimization-based algorithm is developed to address the formulated nonconvex problem. For solving the power allocation subproblem, the successive convex approximation method is invoked. For the pinching beamforming design subproblem, a penalty-based gradient ascent algorithm is first developed for the continuous PA activation case. Then, for the discrete PA activation case, a matching theory-based algorithm is proposed to achieve the near-optimal performance but with a low complexity. Numerical results unveil that: 1) For both continuous and discrete activation cases, PASS can achieve a significant performance gain over conventional fixed-position antenna systems; 2) the proposed WDMA can effectively underpin multi-user communications with the near orthogonality in free space achieved by the pinching beamforming; and 3) the performance gap between the discrete and continuous activation cases can be significantly alleviated with practically feasible numbers of PA candidate positions.
△ Less
Submitted 24 February, 2025;
originally announced February 2025.
-
External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation
Authors:
Mingfu Liang,
Xi Liu,
Rong Jin,
Boyang Liu,
Qiuling Suo,
Qinghai Zhou,
Song Zhou,
Laming Chen,
Hua Zheng,
Zhiyuan Li,
Shali Jiang,
Jiyan Yang,
Xiaozhen Xia,
Fan Yang,
Yasmine Badr,
Ellie Wen,
Shuyu Xu,
Hansey Chen,
Zhengyu Zhang,
Jade Nie,
Chunzhi Yang,
Zhichen Zeng,
Weilin Zhang,
Xingliang Huang,
Qianru Li
, et al. (80 additional authors not shown)
Abstract:
Ads recommendation is a prominent service of online advertising systems and has been actively studied. Recent studies indicate that scaling-up and advanced design of the recommendation model can bring significant performance improvement. However, with a larger model scale, such prior studies have a significantly increasing gap from industry as they often neglect two fundamental challenges in indus…
▽ More
Ads recommendation is a prominent service of online advertising systems and has been actively studied. Recent studies indicate that scaling-up and advanced design of the recommendation model can bring significant performance improvement. However, with a larger model scale, such prior studies have a significantly increasing gap from industry as they often neglect two fundamental challenges in industrial-scale applications. First, training and inference budgets are restricted for the model to be served, exceeding which may incur latency and impair user experience. Second, large-volume data arrive in a streaming mode with data distributions dynamically shifting, as new users/ads join and existing users/ads leave the system. We propose the External Large Foundation Model (ExFM) framework to address the overlooked challenges. Specifically, we develop external distillation and a data augmentation system (DAS) to control the computational cost of training/inference while maintaining high performance. We design the teacher in a way like a foundation model (FM) that can serve multiple students as vertical models (VMs) to amortize its building cost. We propose Auxiliary Head and Student Adapter to mitigate the data distribution gap between FM and VMs caused by the streaming data issue. Comprehensive experiments on internal industrial-scale applications and public datasets demonstrate significant performance gain by ExFM.
△ Less
Submitted 23 April, 2025; v1 submitted 20 February, 2025;
originally announced February 2025.
-
Ancilla theory of twisted bilayer graphene I: topological Mott semimetal and symmetric pseudogap metal
Authors:
Jing-Yu Zhao,
Boran Zhou,
Ya-Hui Zhang
Abstract:
In this work, we demonstrate that Mott physics in twisted bilayer graphene (TBG) can be conveniently captured using the ancilla theory, originally proposed in the context of high-Tc cuprates [Zhang and Sachdev, Phys. Rev. Res. 2, 023172 (2020)]. In TBG, the ancilla formalism allows us to calculate the Mott Hubbard bands directly in momentum space, both at and away from the magic angle. Projected t…
▽ More
In this work, we demonstrate that Mott physics in twisted bilayer graphene (TBG) can be conveniently captured using the ancilla theory, originally proposed in the context of high-Tc cuprates [Zhang and Sachdev, Phys. Rev. Res. 2, 023172 (2020)]. In TBG, the ancilla formalism allows us to calculate the Mott Hubbard bands directly in momentum space, both at and away from the magic angle. Projected to the active bands, we reveal a topological obstruction for the hybridization $Φ(\mathbf k)$ between the physical and ancilla bands around the $Γ$ point, leading to a topological Mott semimetal at $ν=0$. At fillings $ν=\pm 1, \pm 2, \pm 3$, we obtain symmetric correlated insulators at large $U$, and also transitions to semimetals at smaller $U$ or larger bandwidth. At $ν=-2-x$, we propose a symmetric pseudogap metal at small $x$, which hosts a small Fermi surface.The symmetric pseudogap metal can survive to the zero-temperature limit when there is a sizable anti-Hund's coupling $J_A$. In that case we can write down a model wavefunction within the subspace of active bands. The small Fermi surface of the pseudogap metal is primarily formed by ancilla fermions, which we interpret as composite polarons--consisting of a spin moment on an AA site bound to a hole in the nearest neighbor AA site. Within the active band subspace, the composite polaron at $\mathbf k=0$ is orthogonal to the single-particle state due to their differing angular momenta, and thus has vanishing spectral weight. We suggest that superconductivity emerges from the Cooper pairing of these composite fermions instead of single electrons.
△ Less
Submitted 10 March, 2025; v1 submitted 24 February, 2025;
originally announced February 2025.
-
A new framework for X-ray absorption spectroscopy data analysis based on machine learning: XASDAML
Authors:
Xue Han,
Haodong Yao,
Fei Zhan,
Xueqi Song,
Junfang Zhao,
Haifeng Zhao
Abstract:
X-ray absorption spectroscopy (XAS) is a powerful technique to probe the electronic and structural properties of materials. With the rapid growth in both the volume and complexity of XAS datasets driven by advancements in synchrotron radiation facilities, there is an increasing demand for advanced computational tools capable of efficiently analyzing large-scale data. To address these needs, we int…
▽ More
X-ray absorption spectroscopy (XAS) is a powerful technique to probe the electronic and structural properties of materials. With the rapid growth in both the volume and complexity of XAS datasets driven by advancements in synchrotron radiation facilities, there is an increasing demand for advanced computational tools capable of efficiently analyzing large-scale data. To address these needs, we introduce XASDAML,a flexible, machine learning based framework that integrates the entire data-processing workflow-including dataset construction for spectra and structural descriptors, data filtering, ML modeling, prediction, and model evaluation-into a unified platform. Additionally, it supports comprehensive statistical analysis, leveraging methods such as principal component analysis and clustering to reveal potential patterns and relationships within large datasets. Each module operates independently, allowing users to modify or upgrade modules in response to evolving research needs or technological advances. Moreover, the platform provides a user-friendly interface via Jupyter Notebook, making it accessible to researchers at varying levels of expertise. The versatility and effectiveness of XASDAML are exemplified by its application to a copper dataset, where it efficiently manages large and complex data, supports both supervised and unsupervised machine learning models, provides comprehensive statistics for structural descriptors, generates spectral plots, and accurately predicts coordination numbers and bond lengths. Furthermore, the platform streamlining the integration of XAS with machine learning and lowering the barriers to entry for new users.
△ Less
Submitted 23 February, 2025;
originally announced February 2025.
-
Sequence-level Large Language Model Training with Contrastive Preference Optimization
Authors:
Zhili Feng,
Dhananjay Ram,
Cole Hawkins,
Aditya Rawal,
Jinman Zhao,
Sheng Zha
Abstract:
The next token prediction loss is the dominant self-supervised training objective for large language models and has achieved promising results in a variety of downstream tasks. However, upon closer investigation of this objective, we find that it lacks an understanding of sequence-level signals, leading to a mismatch between training and inference processes. To bridge this gap, we introduce a cont…
▽ More
The next token prediction loss is the dominant self-supervised training objective for large language models and has achieved promising results in a variety of downstream tasks. However, upon closer investigation of this objective, we find that it lacks an understanding of sequence-level signals, leading to a mismatch between training and inference processes. To bridge this gap, we introduce a contrastive preference optimization (CPO) procedure that can inject sequence-level information into the language model at any training stage without expensive human labeled data. Our experiments show that the proposed objective surpasses the next token prediction in terms of win rate in the instruction-following and text generation tasks.
△ Less
Submitted 22 February, 2025;
originally announced February 2025.
-
Propagation Performance of Terahertz Channels in Lunar Dust
Authors:
Peian Li,
Jiabiao Zhao,
Mingxia Zhang,
Yuheng Song,
Wenbo Liu,
Lingfeng Tian,
Chen Yao,
Jianjun Ma
Abstract:
The growing momentum in lunar exploration programs and urgent need for robust communication systems capable of operating in dust-laden lunar environments necessitate comprehensive understanding of channel propagation characteristics in lunar conditions. In this article, we present a comprehensive analysis of terahertz (THz) channel propagation characteristics through lunar dust environments, criti…
▽ More
The growing momentum in lunar exploration programs and urgent need for robust communication systems capable of operating in dust-laden lunar environments necessitate comprehensive understanding of channel propagation characteristics in lunar conditions. In this article, we present a comprehensive analysis of terahertz (THz) channel propagation characteristics through lunar dust environments, critical for establishing reliable communication and sensing infrastructure on the Moon. We develop an extended Mie scattering model incorporating the unique properties of lunar dust particles (Apollo 11 sample 10084, Apollo 14 sample 14003, and Apollo 17 sample 70051), including their irregular morphology, dielectric characteristics, and charge-dependent behavior. Through theoretical analysis and experimental verification, we examine both power and bit error rate (BER) performance across varying dust conditions. Our results reveal distinct relationships between particle charge levels, morphological characteristics, and channel performance with power loss patterns and BER evolution. Our findings provide essential guidelines for developing robust lunar communication systems that integrate sensing capabilities, contributing to the establishment of sustainable lunar infrastructure.
△ Less
Submitted 14 March, 2025; v1 submitted 22 February, 2025;
originally announced February 2025.
-
Single Inclusive $π^\pm$ and $K^\pm$ Production in $e^+e^-$ Annihilation at center-of-mass Energies from 2.000 to 3.671GeV
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (707 additional authors not shown)
Abstract:
Using data samples with a total integrated luminosity of 253 $\rm pb^{-1}$ collected by the BESIII detector operating at the BEPCII collider, the differential cross-sections of inclusive $π^\pm$ and $K^\pm$ production, as a function of momentum and normalized by the total hadronic cross-section, are measured at center-of-mass energies from 2.000 to 3.671 GeV. The measured $π^{\pm}$ cross sections…
▽ More
Using data samples with a total integrated luminosity of 253 $\rm pb^{-1}$ collected by the BESIII detector operating at the BEPCII collider, the differential cross-sections of inclusive $π^\pm$ and $K^\pm$ production, as a function of momentum and normalized by the total hadronic cross-section, are measured at center-of-mass energies from 2.000 to 3.671 GeV. The measured $π^{\pm}$ cross sections are consistent with the previously reported $π^{0}$ cross-sections by BESIII, while the $K^{\pm}$ cross sections are systematically higher than the $K^0_S$ cross sections by a factor of approximately 1.4. These new results are in agreement with state-of-the-art QCD analyses at next-to-next-to-leading order accuracy, particularly in the large hadron momentum region at energy scales down to 3 GeV. These findings support the validity of isospin symmetry in parton fragmentation processes.
△ Less
Submitted 22 February, 2025;
originally announced February 2025.
-
Category-free Out-of-Distribution Node Detection with Feature Resonance
Authors:
Shenzhi Yang,
Junbo Zhao,
Shouqing Yang,
Yixuan Li,
Dingyu Yang,
Xiaofang Zhang,
Haobo Wang
Abstract:
Detecting out-of-distribution (OOD) nodes in the graph-based machine-learning field is challenging, particularly when in-distribution (ID) node multi-category labels are unavailable. Thus, we focus on feature space rather than label space and find that, ideally, during the optimization of known ID samples, unknown ID samples undergo more significant representation changes than OOD samples, even if…
▽ More
Detecting out-of-distribution (OOD) nodes in the graph-based machine-learning field is challenging, particularly when in-distribution (ID) node multi-category labels are unavailable. Thus, we focus on feature space rather than label space and find that, ideally, during the optimization of known ID samples, unknown ID samples undergo more significant representation changes than OOD samples, even if the model is trained to fit random targets, which we called the Feature Resonance phenomenon. The rationale behind it is that even without gold labels, the local manifold may still exhibit smooth resonance. Based on this, we further develop a novel graph OOD framework, dubbed Resonance-based Separation and Learning (RSL), which comprises two core modules: (i) a more practical micro-level proxy of feature resonance that measures the movement of feature vectors in one training step. (ii) integrate with synthetic OOD nodes strategy to train an effective OOD classifier. Theoretically, we derive an error bound showing the superior separability of OOD nodes during the resonance period. Empirically, RSL achieves state-of-the-art performance, reducing the FPR95 metric by an average of 18.51% across five real-world datasets.
△ Less
Submitted 21 February, 2025;
originally announced February 2025.
-
BSODiag: A Global Diagnosis Framework for Batch Servers Outage in Large-scale Cloud Infrastructure Systems
Authors:
Tao Duan,
Runqing Chen,
Pinghui Wang,
Junzhou Zhao,
Jiongzhou Liu,
Shujie Han,
Yi Liu,
Fan Xu
Abstract:
Cloud infrastructure is the collective term for all physical devices within cloud systems. Failures within the cloud infrastructure system can severely compromise the stability and availability of cloud services. Particularly, batch servers outage, which is the most fatal failure, could result in the complete unavailability of all upstream services. In this work, we focus on the batch servers outa…
▽ More
Cloud infrastructure is the collective term for all physical devices within cloud systems. Failures within the cloud infrastructure system can severely compromise the stability and availability of cloud services. Particularly, batch servers outage, which is the most fatal failure, could result in the complete unavailability of all upstream services. In this work, we focus on the batch servers outage diagnosis problem, aiming to accurately and promptly analyze the root cause of outages to facilitate troubleshooting. However, our empirical study conducted in a real industrial system indicates that it is a challenging task. Firstly, the collected single-modal coarse-grained failure monitoring data (i.e., alert, incident, or change) in the cloud infrastructure system is insufficient for a comprehensive failure profiling. Secondly, due to the intricate dependencies among devices, outages are often the cumulative result of multiple failures, but correlations between failures are difficult to ascertain. To address these problems, we propose BSODiag, an unsupervised and lightweight diagnosis framework for batch servers outage. BSODiag provides a global analytical perspective, thoroughly explores failure information from multi-source monitoring data, models the spatio-temporal correlations among failures, and delivers accurate and interpretable diagnostic results. Experiments conducted on the Alibaba Cloud infrastructure system show that BSODiag achieves 87.5% PR@3 and 46.3% PCR, outperforming baseline methods by 10.2% and 3.7%, respectively.
△ Less
Submitted 31 January, 2025;
originally announced February 2025.
-
Ultra-high-energy $γ$-ray emission associated with the tail of a bow-shock pulsar wind nebula
Authors:
Zhen Cao,
F. Aharonian,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
W. Bian,
A. V. Bukevich,
C. M. Cai,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
H. X. Chen,
Liang Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. Chen,
S. H. Chen,
S. Z. Chen
, et al. (274 additional authors not shown)
Abstract:
In this study, we present a comprehensive analysis of an unidentified point-like ultra-high-energy (UHE) $γ$-ray source, designated as 1LHAASO J1740+0948u, situated in the vicinity of the middle-aged pulsar PSR J1740+1000. The detection significance reached 17.1$σ$ (9.4$σ$) above 25$\,$TeV (100$\,$TeV). The source energy spectrum extended up to 300$\,$TeV, which was well fitted by a log-parabola f…
▽ More
In this study, we present a comprehensive analysis of an unidentified point-like ultra-high-energy (UHE) $γ$-ray source, designated as 1LHAASO J1740+0948u, situated in the vicinity of the middle-aged pulsar PSR J1740+1000. The detection significance reached 17.1$σ$ (9.4$σ$) above 25$\,$TeV (100$\,$TeV). The source energy spectrum extended up to 300$\,$TeV, which was well fitted by a log-parabola function with $N0 = (1.93\pm0.23) \times 10^{-16} \rm{TeV^{-1}\,cm^{-2}\,s^{-2}}$, $α= 2.14\pm0.27$, and $β= 1.20\pm0.41$ at E0 = 30$\,$TeV. The associated pulsar, PSR J1740+1000, resides at a high galactic latitude and powers a bow-shock pulsar wind nebula (BSPWN) with an extended X-ray tail. The best-fit position of the gamma-ray source appeared to be shifted by $0.2^{\circ}$ with respect to the pulsar position. As the (i) currently identified pulsar halos do not demonstrate such offsets, and (ii) centroid of the gamma-ray emission is approximately located at the extension of the X-ray tail, we speculate that the UHE $γ$-ray emission may originate from re-accelerated electron/positron pairs that are advected away in the bow-shock tail.
△ Less
Submitted 24 February, 2025; v1 submitted 21 February, 2025;
originally announced February 2025.
-
Research advances on fish feeding behavior recognition and intensity quantification methods in aquaculture
Authors:
Shulong Zhang,
Daoliang Li,
Jiayin Zhao,
Mingyuan Yao,
Yingyi Chen,
Yukang Huo,
Xiao Liu,
Haihua Wang
Abstract:
As a key part of aquaculture management, fish feeding behavior recognition and intensity quantification has been a hot area of great concern to researchers, and it plays a crucial role in monitoring fish health, guiding baiting work and improving aquaculture efficiency. In order to better carry out the related work in the future, this paper firstly reviews the research advances of fish feeding beh…
▽ More
As a key part of aquaculture management, fish feeding behavior recognition and intensity quantification has been a hot area of great concern to researchers, and it plays a crucial role in monitoring fish health, guiding baiting work and improving aquaculture efficiency. In order to better carry out the related work in the future, this paper firstly reviews the research advances of fish feeding behavior recognition and intensity quantification methods based on computer vision, acoustics and sensors in a single modality. Then the application of the current emerging multimodal fusion in fish feeding behavior recognition and intensity quantification methods is expounded. Finally, the advantages and disadvantages of various techniques are compared and analyzed, and the future research directions are envisioned.
△ Less
Submitted 21 February, 2025;
originally announced February 2025.
-
Debunking the Myth of Join Ordering: Toward Robust SQL Analytics
Authors:
Junyi Zhao,
Kai Su,
Yifei Yang,
Xiangyao Yu,
Paraschos Koutris,
Huanchen Zhang
Abstract:
Join order optimization is critical in achieving good query performance. Despite decades of research and practice, modern query optimizers could still generate inferior join plans that are orders of magnitude slower than optimal. Existing research on robust query processing often lacks theoretical guarantees on join-order robustness while sacrificing query performance. In this paper, we rediscover…
▽ More
Join order optimization is critical in achieving good query performance. Despite decades of research and practice, modern query optimizers could still generate inferior join plans that are orders of magnitude slower than optimal. Existing research on robust query processing often lacks theoretical guarantees on join-order robustness while sacrificing query performance. In this paper, we rediscover the recent Predicate Transfer technique from a robustness point of view. We introduce two new algorithms, LargestRoot and SafeSubjoin, and then propose Robust Predicate Transfer (RPT) that is provably robust against arbitrary join orders of an acyclic query. We integrated Robust Predicate Transfer with DuckDB, a state-of-the-art analytical database, and evaluated against all the queries in TPC-H, JOB, and TPC-DS benchmarks. Our experimental results show that RPT improves join-order robustness by orders of magnitude compared to the baseline. With RPT, the largest ratio between the maximum and minimum execution time out of random join orders for a single acyclic query is only 1.6x (the ratio is close to 1 for most evaluated queries). Meanwhile, applying RPT also improves the end-to-end query performance by 1.5x (per-query geometric mean). We hope that this work sheds light on solving the practical join ordering problem.
△ Less
Submitted 6 March, 2025; v1 submitted 20 February, 2025;
originally announced February 2025.
-
AccessFixer: Enhancing GUI Accessibility for Low Vision Users With R-GCN Model
Authors:
Mengxi Zhang,
Huaxiao Liu,
Chunyang Chen,
Guangyong Gao,
Han Li,
Jian Zhao
Abstract:
The Graphical User Interface (GUI) plays a critical role in the interaction between users and mobile applications (apps), aiming at facilitating the operation process. However, due to the variety of functions and non-standardized design, GUIs might have many accessibility issues, like the size of components being too small or their intervals being narrow. These issues would hinder the operation of…
▽ More
The Graphical User Interface (GUI) plays a critical role in the interaction between users and mobile applications (apps), aiming at facilitating the operation process. However, due to the variety of functions and non-standardized design, GUIs might have many accessibility issues, like the size of components being too small or their intervals being narrow. These issues would hinder the operation of low vision users, preventing them from obtaining information accurately and conveniently. Although several technologies and methods have been proposed to address these issues, they are typically confined to issue identification, leaving the resolution in the hands of developers. Moreover, it can be challenging to ensure that the color, size, and interval of the fixed GUIs are appropriately compared to the original ones. In this work, we propose a novel approach named AccessFixer, which utilizes the Relational-Graph Convolutional Neural Network (R-GCN) to simultaneously fix three kinds of accessibility issues, including small sizes, narrow intervals, and low color contrast in GUIs. With AccessFixer, the fixed GUIs would have a consistent color palette, uniform intervals, and adequate size changes achieved through coordinated adjustments to the attributes of related components. Our experiments demonstrate the effectiveness and usefulness of AccessFixer in fixing GUI accessibility issues. After fixing 30 real-world apps, our approach solves an average of 81.2% of their accessibility issues. Also, we apply AccessFixer to 10 open-source apps by submitting the fixed results with pull requests (PRs) on GitHub. The results demonstrate that developers approve of our submitted fixed GUIs, with 8 PRs being merged or under fixing. A user study examines that low vision users host a positive attitude toward the GUIs fixed by our method.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
Don't Confuse! Redrawing GUI Navigation Flow in Mobile Apps for Visually Impaired Users
Authors:
Mengxi Zhang,
Huaxiao Liu,
Yuheng Zhou,
Chunyang Chen,
Pei Huang,
Jian Zhao
Abstract:
Mobile applications (apps) are integral to our daily lives, offering diverse services and functionalities. They enable sighted users to access information coherently in an extremely convenient manner. However, it remains unclear if visually impaired users, who rely solely on the screen readers (e.g., Talkback) to navigate and access app information, can do so in the correct and reasonable order. T…
▽ More
Mobile applications (apps) are integral to our daily lives, offering diverse services and functionalities. They enable sighted users to access information coherently in an extremely convenient manner. However, it remains unclear if visually impaired users, who rely solely on the screen readers (e.g., Talkback) to navigate and access app information, can do so in the correct and reasonable order. This may result in significant information bias and operational errors. Considering these issues, in this work, we proposed a method named RGNF (Re-draw GUI Navigation Flow). It aimed to enhance the understandability and coherence of accessing the content of each component within the Graphical User Interface (GUI), together with assisting developers in creating well-designed GUI navigation flow (GNF). This method was inspired by the characteristics identified in our preliminary study, where visually impaired users expected navigation to be associated with close position and similar shape of GUI components that were read consecutively. Thus, our method relied on the principles derived from the Gestalt psychological model, aiming to group GUI components into different regions according to the laws of proximity and similarity, thereby redrawing the GNFs. To evaluate the effectiveness of our method, we calculated sequence similarity values before and after redrawing the GNF, and further employed the tools proposed by Alotaibi et al. to measure the reachability of GUI components. Our results demonstrated a substantial improvement in similarity (0.921) compared to the baseline (0.624), together with the reachability (90.31%) compared to the baseline GNF (74.35%). Furthermore, a qualitative user study revealed that our method had a positive effect on providing visually impaired users with an improved user experience.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
SOFIA/HAWC+ Far-Infrared Polarimetric Large Area CMZ Exploration Survey. V. The Magnetic Field Strength and Morphology in the Sagittarius C Complex
Authors:
Roy J. Zhao,
Mark R. Morris,
David T. Chuss,
Dylan M. Paré,
Jordan A. Guerra,
Natalie O. Butterfield,
Edward J. Wollack,
Kaitlyn Karpovich
Abstract:
We present an analysis of the magnetic field strength and morphology in the Sagittarius C complex (Sgr C; G359.43-0.09) in the Milky Way Galaxy's Central Molecular Zone (CMZ) using the 214 $μ$m polarimetry data acquired with the High-Angular-Resolution Wideband Camera+ (HAWC+) instrument aboard the Stratospheric Observatory for Infrared Astronomy (SOFIA). We use several hundred magnetic field pseu…
▽ More
We present an analysis of the magnetic field strength and morphology in the Sagittarius C complex (Sgr C; G359.43-0.09) in the Milky Way Galaxy's Central Molecular Zone (CMZ) using the 214 $μ$m polarimetry data acquired with the High-Angular-Resolution Wideband Camera+ (HAWC+) instrument aboard the Stratospheric Observatory for Infrared Astronomy (SOFIA). We use several hundred magnetic field pseudovectors in the Sgr C region to trace the projected magnetic field orientation within cold molecular gas clouds, and as is the trend throughout the CMZ, they show a higher polarization fraction toward the periphery of the clouds. We conduct a modified Davis-Chandrasekhar-Fermi (DCF) analysis of individual clouds and find that the sky-plane magnetic field strength varies from highly turbulent regions having inferred strengths of $\sim30~μ{\rm G}$ to regions of relatively uniform field orientation having strengths of $\sim 1~{\rm mG}$. The magnetic field orientations suggest that outflows from active star-forming regions, such as the extended green object (EGO) G359.43-0.10 and the protostellar source FIR-4 (G359.43+0.02), cause high turbulence in their vicinity. The magnetic field direction is found to be tangential to the surface of the Sgr C HII region, as well as two [CII] emission cavities around this region. Several other features in the vicinity of Sgr C, especially numerous non-thermal filaments (NTFs) and a diffuse source of X-ray emission towards the southwest of the \hii{} region, are discussed with regard to the observed magnetic field pseudovectors.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
Study the impact of polarized background fields on coupling constants in EIC and EicC
Authors:
Cong Li,
Jing Zhao
Abstract:
In the polarized background field, the coupling constant will be influenced. We quantify this effect and propose that it can be measured at the EIC and EicC through the Bethe-Heitler process.
In the polarized background field, the coupling constant will be influenced. We quantify this effect and propose that it can be measured at the EIC and EicC through the Bethe-Heitler process.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
GATE: Graph-based Adaptive Tool Evolution Across Diverse Tasks
Authors:
Jianwen Luo,
Yiming Huang,
Jinxiang Meng,
Fangyu Lei,
Shizhu He,
Xiao Liu,
Shanshan Jiang,
Bin Dong,
Jun Zhao,
Kang Liu
Abstract:
Large Language Models (LLMs) have shown great promise in tool-making, yet existing frameworks often struggle to efficiently construct reliable toolsets and are limited to single-task settings. To address these challenges, we propose GATE (Graph-based Adaptive Tool Evolution), an adaptive framework that dynamically constructs and evolves a hierarchical graph of reusable tools across multiple scenar…
▽ More
Large Language Models (LLMs) have shown great promise in tool-making, yet existing frameworks often struggle to efficiently construct reliable toolsets and are limited to single-task settings. To address these challenges, we propose GATE (Graph-based Adaptive Tool Evolution), an adaptive framework that dynamically constructs and evolves a hierarchical graph of reusable tools across multiple scenarios. We evaluate GATE on open-ended tasks (Minecraft), agent-based tasks (TextCraft, DABench), and code generation tasks (MATH, Date, TabMWP). Our results show that GATE achieves up to 4.3x faster milestone completion in Minecraft compared to the previous SOTA, and provides an average improvement of 9.23% over existing tool-making methods in code generation tasks and 10.03% in agent tasks. GATE demonstrates the power of adaptive evolution, balancing tool quantity, complexity, and functionality while maintaining high efficiency. Code and data are available at \url{https://github.com/ayanami2003/GATE}.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
Isotropic superconductivity in pressurized trilayer nickelate La4Ni3O10
Authors:
Di Peng,
Yaolong Bian,
Zhenfang Xing,
Lixing Chen,
Jiaqiang Cai,
Tao Luo,
Fujun Lan,
Yuxin Liu,
Yinghao Zhu,
Enkang Zhang,
Zhaosheng Wang,
Yuping Sun,
Yuzhu Wang,
Xingya Wang,
Chenyue Wang,
Yuqi Yang,
Yanping Yang,
Hongliang Dong,
Hongbo Lou,
Zhidan Zeng,
Zhi Zeng,
Mingliang Tian,
Jun Zhao,
Qiaoshi Zeng,
Jinglei Zhang
, et al. (1 additional authors not shown)
Abstract:
Evidence of superconductivity (SC) has recently been reported in pressurized La3Ni2O7 and La4Ni3O10, providing a new platform to explore high-temperature superconductivity. However, while zero resistance state has been observed, experimental characterization of the superconducting properties of pressurized nickelates is still limited and experimentally challenging. Here, we present the first full…
▽ More
Evidence of superconductivity (SC) has recently been reported in pressurized La3Ni2O7 and La4Ni3O10, providing a new platform to explore high-temperature superconductivity. However, while zero resistance state has been observed, experimental characterization of the superconducting properties of pressurized nickelates is still limited and experimentally challenging. Here, we present the first full temperature dependence of the upper critical field Hc2 measurement in La4Ni3O10 single crystal, achieved by combining high magnetic field and high-pressure techniques. Remarkably, the Hc2 of La4Ni3O10 is nearly isotropic, with the anisotropic parameter monotonically increasing from 1.4 near Tc to 1 at lower temperatures. By analyzing the Hc2 using the two-band model, we uncover that the anisotropic diffusivity of the bands, primarily originating from d(z2 ) and d(x2-y2 ) orbitals, is well compensated, resulting in an unusually isotropic superconducting state. These findings provide critical experimental evidence that underscores the significant role of the d(z2 ) orbital in enabling superconductivity in pressurized Ruddlesden-Popper nickelates.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective
Authors:
Yue Huang,
Chujie Gao,
Siyuan Wu,
Haoran Wang,
Xiangqi Wang,
Yujun Zhou,
Yanbo Wang,
Jiayi Ye,
Jiawen Shi,
Qihui Zhang,
Yuan Li,
Han Bao,
Zhaoyi Liu,
Tianrui Guan,
Dongping Chen,
Ruoxi Chen,
Kehan Guo,
Andy Zou,
Bryan Hooi Kuen-Yew,
Caiming Xiong,
Elias Stengel-Eskin,
Hongyang Zhang,
Hongzhi Yin,
Huan Zhang,
Huaxiu Yao
, et al. (41 additional authors not shown)
Abstract:
Generative Foundation Models (GenFMs) have emerged as transformative tools. However, their widespread adoption raises critical concerns regarding trustworthiness across dimensions. This paper presents a comprehensive framework to address these challenges through three key contributions. First, we systematically review global AI governance laws and policies from governments and regulatory bodies, a…
▽ More
Generative Foundation Models (GenFMs) have emerged as transformative tools. However, their widespread adoption raises critical concerns regarding trustworthiness across dimensions. This paper presents a comprehensive framework to address these challenges through three key contributions. First, we systematically review global AI governance laws and policies from governments and regulatory bodies, as well as industry practices and standards. Based on this analysis, we propose a set of guiding principles for GenFMs, developed through extensive multidisciplinary collaboration that integrates technical, ethical, legal, and societal perspectives. Second, we introduce TrustGen, the first dynamic benchmarking platform designed to evaluate trustworthiness across multiple dimensions and model types, including text-to-image, large language, and vision-language models. TrustGen leverages modular components--metadata curation, test case generation, and contextual variation--to enable adaptive and iterative assessments, overcoming the limitations of static evaluation methods. Using TrustGen, we reveal significant progress in trustworthiness while identifying persistent challenges. Finally, we provide an in-depth discussion of the challenges and future directions for trustworthy GenFMs, which reveals the complex, evolving nature of trustworthiness, highlighting the nuanced trade-offs between utility and trustworthiness, and consideration for various downstream applications, identifying persistent challenges and providing a strategic roadmap for future research. This work establishes a holistic framework for advancing trustworthiness in GenAI, paving the way for safer and more responsible integration of GenFMs into critical applications. To facilitate advancement in the community, we release the toolkit for dynamic evaluation.
△ Less
Submitted 11 May, 2025; v1 submitted 20 February, 2025;
originally announced February 2025.
-
Are your apps accessible? A GCN-based accessibility checker for low vision users
Authors:
Mengxi Zhang,
Huaxiao Liu,
Shenning Song,
Chunyang Chen,
Pei Huang,
Jian Zhao
Abstract:
Context: Accessibility issues (e.g., small size and narrow interval) in mobile applications (apps) lead to obstacles for billions of low vision users in interacting with Graphical User Interfaces (GUIs). Although GUI accessibility scanning tools exist, most of them perform rule-based check relying on complex GUI hierarchies. This might make them detect invisible redundant information, cannot handl…
▽ More
Context: Accessibility issues (e.g., small size and narrow interval) in mobile applications (apps) lead to obstacles for billions of low vision users in interacting with Graphical User Interfaces (GUIs). Although GUI accessibility scanning tools exist, most of them perform rule-based check relying on complex GUI hierarchies. This might make them detect invisible redundant information, cannot handle small deviations, omit similar components, and is hard to extend. Objective: In this paper, we propose a novel approach, named ALVIN (Accessibility Checker for Low Vision), which represents the GUI as a graph and adopts the Graph Convolutional Neural Networks (GCN) to label inaccessible components. Method: ALVIN removes invisible views to prevent detecting redundancy and uses annotations from low vision users to handle small deviations. Also, the GCN model could consider the relations between GUI components, connecting similar components and reducing the possibility of omission. ALVIN only requires users to annotate the relevant dataset when detecting new kinds of issues. Results: Our experiments on 48 apps demonstrate the effectiveness of ALVIN, with precision of 83.5%, recall of 78.9%, and F1-score of 81.2%, outperforming baseline methods. In RQ2, the usefulness is verified through 20 issues submitted to open-source apps. The RQ3 also illustrates the GCN model is better than other models. Conclusion: To summarize, our proposed approach can effectively detect accessibility issues in GUIs for low vision users, thereby guiding developers in fixing them efficiently.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
Vulnerability of Text-to-Image Models to Prompt Template Stealing: A Differential Evolution Approach
Authors:
Yurong Wu,
Fangwen Mu,
Qiuhong Zhang,
Jinjing Zhao,
Xinrun Xu,
Lingrui Mei,
Yang Wu,
Lin Shi,
Junjie Wang,
Zhiming Ding,
Yiwei Wang
Abstract:
Prompt trading has emerged as a significant intellectual property concern in recent years, where vendors entice users by showcasing sample images before selling prompt templates that can generate similar images. This work investigates a critical security vulnerability: attackers can steal prompt templates using only a limited number of sample images. To investigate this threat, we introduce Prism,…
▽ More
Prompt trading has emerged as a significant intellectual property concern in recent years, where vendors entice users by showcasing sample images before selling prompt templates that can generate similar images. This work investigates a critical security vulnerability: attackers can steal prompt templates using only a limited number of sample images. To investigate this threat, we introduce Prism, a prompt-stealing benchmark consisting of 50 templates and 450 images, organized into Easy and Hard difficulty levels. To identify the vulnerabity of VLMs to prompt stealing, we propose EvoStealer, a novel template stealing method that operates without model fine-tuning by leveraging differential evolution algorithms. The system first initializes population sets using multimodal large language models (MLLMs) based on predefined patterns, then iteratively generates enhanced offspring through MLLMs. During evolution, EvoStealer identifies common features across offspring to derive generalized templates. Our comprehensive evaluation conducted across open-source (INTERNVL2-26B) and closed-source models (GPT-4o and GPT-4o-mini) demonstrates that EvoStealer's stolen templates can reproduce images highly similar to originals and effectively generalize to other subjects, significantly outperforming baseline methods with an average improvement of over 10%. Moreover, our cost analysis reveals that EvoStealer achieves template stealing with negligible computational expenses. Our code and dataset are available at https://github.com/whitepagewu/evostealer.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
D.Va: Validate Your Demonstration First Before You Use It
Authors:
Qi Zhang,
Zhiqing Xiao,
Ruixuan Xiao,
Lirong Gao,
Junbo Zhao
Abstract:
In-context learning (ICL) has demonstrated significant potential in enhancing the capabilities of large language models (LLMs) during inference. It's well-established that ICL heavily relies on selecting effective demonstrations to generate outputs that better align with the expected results. As for demonstration selection, previous approaches have typically relied on intuitive metrics to evaluate…
▽ More
In-context learning (ICL) has demonstrated significant potential in enhancing the capabilities of large language models (LLMs) during inference. It's well-established that ICL heavily relies on selecting effective demonstrations to generate outputs that better align with the expected results. As for demonstration selection, previous approaches have typically relied on intuitive metrics to evaluate the effectiveness of demonstrations, which often results in limited robustness and poor cross-model generalization capabilities. To tackle these challenges, we propose a novel method, \textbf{D}emonstration \textbf{VA}lidation (\textbf{D.Va}), which integrates a demonstration validation perspective into this field. By introducing the demonstration validation mechanism, our method effectively identifies demonstrations that are both effective and highly generalizable. \textbf{D.Va} surpasses all existing demonstration selection techniques across both natural language understanding (NLU) and natural language generation (NLG) tasks. Additionally, we demonstrate the robustness and generalizability of our approach across various language models with different retrieval models.
△ Less
Submitted 19 February, 2025;
originally announced February 2025.
-
Amplitude analysis of $ψ(3686)\to γK_S^0 K_S^0 $
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann,
H. Cai
, et al. (704 additional authors not shown)
Abstract:
Using $(2712\pm14)\times10^6$ $ψ(3686)$ events collected with the BESIII detector, we perform the first amplitude analysis of the radiative decay $ψ(3686)\to γK_S^0 K_S^0$ within the mass region $M_{K_S^0 K_S^0 }<2.8$ GeV/$c^2$. Employing a one-channel K-matrix approach for the description of the dynamics of the $K^0_S K^0_S$ system, the data sample is well described with four poles for the $f_0$-…
▽ More
Using $(2712\pm14)\times10^6$ $ψ(3686)$ events collected with the BESIII detector, we perform the first amplitude analysis of the radiative decay $ψ(3686)\to γK_S^0 K_S^0$ within the mass region $M_{K_S^0 K_S^0 }<2.8$ GeV/$c^2$. Employing a one-channel K-matrix approach for the description of the dynamics of the $K^0_S K^0_S$ system, the data sample is well described with four poles for the $f_0$-wave and three poles for the $f_2$-wave. The determined pole positions are consistent with those of well-established resonance states. The observed $f_0$ and $f_{2}$ states are found to be in agreement with those produced in radiative $J/ψ$ decays. The production behaviors of $f_0$ and $f_2$ poles in $ψ(3686)\toγK_S^0 K_S^0$ are qualified with their residues and the converted branching fractions. By comparing with $J/ψ\toγK_S^0 K_S^0$ decay, the ratios $\frac{\mathcal{B}(ψ(3686)\toγf_{0,2})}{\mathcal{B}(J/ψ\toγf_{0,2})}$ are determined, which provides crucial experimental inputs on the internal structure of the $f_{0,2}$ states, especially their potential mixing with glueball components.
△ Less
Submitted 7 May, 2025; v1 submitted 19 February, 2025;
originally announced February 2025.