Search | arXiv e-print repository

Halin graphs with positive Lin-Lu-Yau curvature

Authors: Kaizhe Chen, Huiqiu Lin, Shiping Liu, Zhe You

Abstract: Halin graphs constitute an interesting class of planar and polyhedral graphs. A generalized Halin graph is obtained by connecting all leaves of a planar embedding of a tree via a cycle. A Halin graph is a generalized Halin graph having no vertex of degree two. We classify all generalized Halin graphs with positive Lin-Lu-Yau curvature. Halin graphs constitute an interesting class of planar and polyhedral graphs. A generalized Halin graph is obtained by connecting all leaves of a planar embedding of a tree via a cycle. A Halin graph is a generalized Halin graph having no vertex of degree two. We classify all generalized Halin graphs with positive Lin-Lu-Yau curvature. △ Less

Submitted 7 May, 2025; originally announced May 2025.

MSC Class: 05C10; 05C81; 51F99

arXiv:2505.04093 [pdf, other]

Neutrino-jet correlations in charged-current SIDIS

Authors: Weihua Yang, Jing Zhao, Zhe Zhang

Abstract: Charged-current deep inelastic scattering plays a significant role in determining parton distribution functions with flavour separation. In this work, we present a systematic calculation of the charged-current semi-inclusive deep inelastic scattering (SIDIS) in the $eN$ collinear frame up to twist-3 level at leading order. Semi-inclusive refers to the process in which a jet is detected in addition… ▽ More Charged-current deep inelastic scattering plays a significant role in determining parton distribution functions with flavour separation. In this work, we present a systematic calculation of the charged-current semi-inclusive deep inelastic scattering (SIDIS) in the $eN$ collinear frame up to twist-3 level at leading order. Semi-inclusive refers to the process in which a jet is detected in addition to the scattered neutrino. We focus on neutrino-jet correlations in our calculation. We first present the differential cross section in terms of structure functions, followed by the differential cross section expressed in term of transverse momentum dependent parton distribution functions. We derive the complete set of azimuthal asymmetries and intrinsic asymmetries. We also introduce an observable $A^C$, defined as the ratio of the difference to the sum of differential cross sections for electron and positron semi-inclusive deep inelastic scattering. We notice that $A^C$ not only provides a sensitive probe for valence quark distribution functions but also can reveal the violation of strange-antistrange symmetry. △ Less

Submitted 6 May, 2025; originally announced May 2025.

arXiv:2505.03505 [pdf, other]

doi 10.1051/0004-6361/202452072

DCO$^+$ and DCN 1-0 survey toward a sample of Planck cold clumps

Authors: Fu Mo, Junzhi Wang, Shu Liu, Yan Duan, Huanxue Feng, Yuqiang Li, Zhe Lu, Rui Luo, Chao Ou, Yani Xu, Zhuoying Yan

Abstract: Deuterated molecules can be used to study the physical conditions and the astro-chemical evolution of molecular clouds. large-sample surveys for deuterated molecules are needed to understand the enhancement of deuterated molecules from diffuse molecular gas to cold cores. A single-pointing survey toward the 559 Planck cold clumps of the Early Cold Core Catalogue (ECC) has been conducted using the… ▽ More Deuterated molecules can be used to study the physical conditions and the astro-chemical evolution of molecular clouds. large-sample surveys for deuterated molecules are needed to understand the enhancement of deuterated molecules from diffuse molecular gas to cold cores. A single-pointing survey toward the 559 Planck cold clumps of the Early Cold Core Catalogue (ECC) has been conducted using the Arizona Radio Observatory 12-meter telescope, focusing on the $J$=1-0 transitions of DCO$^+$ and DCN. The survey included observations of 309 cores for DCO$^+$ and DCN 1-0 simultaneously, followed by 71 of these cores where DCO$^+$ 1-0 was detected for H$^{13}$CO$^+$ and H$^{13}$CN 1-0 simultaneously, aiming to determine the deuterated fraction ($D_{\rm frac}$). Additionally, 250 cores were observed for DCO$^+$, DCN, H$^{13}$CO$^+$ and H$^{13}$CN 1-0 simultaneously. Among the 309 sources, DCO$^+$ and DCN 1-0 were detected in 79 and 11 sources, with a detection rates of 25.6 % and 3.6 % respectively. In the 250 sources observed for all four species, DCO$^+$, DCN, H$^{13}$CO$^+$ and H$^{13}$CN 1-0 were detected in 58, 9, 57 and 13 sources, with a detection rate of 23.2 %, 3.6 %, 22.8 % and 5.2 % respectively. The $D_{\rm frac}$(HCO$^+$) values in 112 sources range from 0.89 % to 7.4 % with a median value of 3.1 %, while $D_{\rm frac}$(HCN) values in 11 sources range from 1.5 % to 5.5 % with a median value of 2.3 %. The line widths of DCO$^+$ and H$^{13}$CO$^+$ 1-0 detections are mostly within 1 km s$^{-1}$. The similarity in $D_{\rm frac}$ values between HCO$^+$ and HCN indicates that the higher detection rate of DCO$^+$ 1-0 compared with DCN 1-0 is due to the lower critical density of DCO$^+$ 1-0. We suggest that the enhancement of DCO$^+$ and DCN likely begins in the early diffuse stage of the molecular cloud, rather than during the cold core formation stage. △ Less

Submitted 6 May, 2025; originally announced May 2025.

Comments: 37 pages, 12 figures, published in A&A

Journal ref: 2025A&A...696A.140M

arXiv:2505.02795 [pdf, other]

HSplitLoRA: A Heterogeneous Split Parameter-Efficient Fine-Tuning Framework for Large Language Models

Authors: Zheng Lin, Yuxin Zhang, Zhe Chen, Zihan Fang, Xianhao Chen, Praneeth Vepakomma, Wei Ni, Jun Luo, Yue Gao

Abstract: Recently, large language models (LLMs) have achieved remarkable breakthroughs, revolutionizing the natural language processing domain and beyond. Due to immense parameter sizes, fine-tuning these models with private data for diverse downstream tasks has become mainstream. Though federated learning (FL) offers a promising solution for fine-tuning LLMs without sharing raw data, substantial computing… ▽ More Recently, large language models (LLMs) have achieved remarkable breakthroughs, revolutionizing the natural language processing domain and beyond. Due to immense parameter sizes, fine-tuning these models with private data for diverse downstream tasks has become mainstream. Though federated learning (FL) offers a promising solution for fine-tuning LLMs without sharing raw data, substantial computing costs hinder its democratization. Moreover, in real-world scenarios, private client devices often possess heterogeneous computing resources, further complicating LLM fine-tuning. To combat these challenges, we propose HSplitLoRA, a heterogeneous parameter-efficient fine-tuning (PEFT) framework built on split learning (SL) and low-rank adaptation (LoRA) fine-tuning, for efficiently fine-tuning LLMs on heterogeneous client devices. HSplitLoRA first identifies important weights based on their contributions to LLM training. It then dynamically configures the decomposition ranks of LoRA adapters for selected weights and determines the model split point according to varying computing budgets of client devices. Finally, a noise-free adapter aggregation mechanism is devised to support heterogeneous adapter aggregation without introducing noise. Extensive experiments demonstrate that HSplitLoRA outperforms state-of-the-art benchmarks in training accuracy and convergence speed. △ Less

Submitted 5 May, 2025; originally announced May 2025.

Comments: 16 pages, 22 figures

arXiv:2505.01992 [pdf, ps, other]

Supermassive Black Holes with High Accretion Rates in Active Galactic Nuclei. XII. Reverberation Mapping Results for 15 PG Quasars from a Long-Duration High-Cadence Campaign

Authors: Chen Hu, Sha-Sha Li, Sen Yang, Zi-Xu Yang, Wei-Jian Guo, Dong-Wei Bao, Bo-Wei Jiang, Pu Du, Yan-Rong Li, Ming Xiao, Yu-Yang Songsheng, Zhe Yu, Jin-Ming Bai, Luis C. Ho, Michael S. Brotherton, Jesús Aceituno, Hartmut Winkler, Jian-Min Wang

Abstract: We present the first results from long-term high-cadence spectroscopic monitoring of 15 PG quasars with relatively strong Fe II emission as a part of a broader reverberation mapping campaign performed with the Calar Alto Observatory 2.2m telescope. The $V$-band, 5100 Å continuum, and H$β$ broad emission line light curves were measured for a set of quasars for between dozens to more than a hundred… ▽ More We present the first results from long-term high-cadence spectroscopic monitoring of 15 PG quasars with relatively strong Fe II emission as a part of a broader reverberation mapping campaign performed with the Calar Alto Observatory 2.2m telescope. The $V$-band, 5100 Å continuum, and H$β$ broad emission line light curves were measured for a set of quasars for between dozens to more than a hundred epochs from May 2017 to July 2020. Accurate time lags between the variations of the H$β$ broad line fluxes and the optical continuum strength are obtained for all 15 quasars, ranging from $17.0_{-3.2}^{+2.5}$ to $95.9_{-23.9}^{+7.1}$ days in the rest frame. The virial masses of the central supermassive black holes are derived for all 15 quasars, ranging between $0.50_{-0.19}^{+0.18}$ and $19.17_{-2.73}^{+2.98}$ in units of $10^7 M_\odot$. For 11 of the objects in our sample, this is the first reverberation analysis published. Of the rest, two objects have been the subject of previous reverberation studies, but we determine time lags for these that are only half as long as found in the earlier investigations, which had only been able to sample much more sparsely. The remaining two objects have previously been monitored with high sampling rates. Our results here are consistent with the earlier findings in the sense that the time lag and the line width vary inversely consistent with virialization. △ Less

Submitted 4 May, 2025; originally announced May 2025.

Comments: 21 pages, 20 figures, published in ApJS, March 2021

Journal ref: 2021, ApJS, 253, 20

arXiv:2505.01981 [pdf, other]

Electrospray Thruster Plume Dynamics: Insights from Precise PP Coulomb Field Simulation

Authors: Zhe Liu, Yinjian Zhao

Abstract: Electrospray thrusters are one important type of micropropulsion systems being developed for next-generation space missions, yet the primary challenge to their operational lifespan is propellant overspray resulting from wide plume angles driven by Coulomb interactions among charged droplets. While existing models often employ truncated Coulomb field approximations, such simplifications compromise… ▽ More Electrospray thrusters are one important type of micropropulsion systems being developed for next-generation space missions, yet the primary challenge to their operational lifespan is propellant overspray resulting from wide plume angles driven by Coulomb interactions among charged droplets. While existing models often employ truncated Coulomb field approximations, such simplifications compromise accuracy in predicting divergence dynamics. In this study, a particle-particle (PP) simulation method is used to directly calculate the interactions between droplets in an electrospray plume coupled with background electric field effects for simulation. The model integrates Boris pusher for numerical integration, validated through binary collision tests verification. Parametric analysis systematically evaluates six key variables, droplet charge, droplet mass, emission interval, droplet initial velocity, and electric field components, to quantify their impacts on plume divergence. The shape of the simulated electrospray plume and the velocity of the droplets in it are analyzed. Parametric analysis demonstrate that reducing droplet charge, increasing droplet mass, extending emission time intervals, and elevating initial drift velocity collectively reduce plume half-angle. These results quantitatively establish parameter plume relationships, providing direct guidance for thruster optimization. △ Less

Submitted 4 May, 2025; originally announced May 2025.

arXiv:2505.01978 [pdf, other]

Generation of 95-qubit genuine entanglement and verification of symmetry-protected topological phases

Authors: Tao Jiang, Jianbin Cai, Junxiang Huang, Naibin Zhou, Yukun Zhang, Jiahao Bei, Guoqing Cai, Sirui Cao, Fusheng Chen, Jiang Chen, Kefu Chen, Xiawei Chen, Xiqing Chen, Zhe Chen, Zhiyuan Chen, Zihua Chen, Wenhao Chu, Hui Deng, Zhibin Deng, Pei Ding, Xun Ding, Zhuzhengqi Ding, Shuai Dong, Bo Fan, Daojin Fan , et al. (130 additional authors not shown)

Abstract: Symmetry-protected topological (SPT) phases are fundamental features of cluster states, serving as key resources for measurement-based quantum computation (MBQC). Generating large-scale cluster states and verifying their SPT phases are essential steps toward practical MBQC, which however still presents significant experimental challenges. In this work, we address these challenges by utilizing adva… ▽ More Symmetry-protected topological (SPT) phases are fundamental features of cluster states, serving as key resources for measurement-based quantum computation (MBQC). Generating large-scale cluster states and verifying their SPT phases are essential steps toward practical MBQC, which however still presents significant experimental challenges. In this work, we address these challenges by utilizing advanced superconducting hardware with optimized gate operations, enhanced readout fidelity, and error mitigation techniques. We successfully generate and verify 95-qubit one-dimensional and 72-qubit two-dimensional genuine entangled cluster states, achieving fidelities of $0.5603 \pm 0.0084$ and $0.5519 \pm 0.0054$, respectively. Leveraging these high-fidelity cluster states, we investigate SPT phases through quantum teleportation across all 95 qubits and demonstrate input-state-dependent robustness against symmetry-breaking perturbations, highlighting the practicality and intrinsic robustness of MBQC enabled by the SPT order. Our results represent a significant advancement in large-scale entanglement generation and topological phase simulation, laying the foundation for scalable and practical MBQC using superconducting quantum systems. △ Less

Submitted 3 May, 2025; originally announced May 2025.

Comments: Main text: 15 pages, 4 figures; supplementary materials: 42 pages, 19 figures. Total: 57 pages, 23 figures

arXiv:2505.01766 [pdf, other]

Multimodal Graph Representation Learning for Robust Surgical Workflow Recognition with Adversarial Feature Disentanglement

Authors: Long Bai, Boyi Ma, Ruohan Wang, Guankun Wang, Beilei Cui, Zhongliang Jiang, Mobarakol Islam, Zhe Min, Jiewen Lai, Nassir Navab, Hongliang Ren

Abstract: Surgical workflow recognition is vital for automating tasks, supporting decision-making, and training novice surgeons, ultimately improving patient safety and standardizing procedures. However, data corruption can lead to performance degradation due to issues like occlusion from bleeding or smoke in surgical scenes and problems with data storage and transmission. In this case, we explore a robust… ▽ More Surgical workflow recognition is vital for automating tasks, supporting decision-making, and training novice surgeons, ultimately improving patient safety and standardizing procedures. However, data corruption can lead to performance degradation due to issues like occlusion from bleeding or smoke in surgical scenes and problems with data storage and transmission. In this case, we explore a robust graph-based multimodal approach to integrating vision and kinematic data to enhance accuracy and reliability. Vision data captures dynamic surgical scenes, while kinematic data provides precise movement information, overcoming limitations of visual recognition under adverse conditions. We propose a multimodal Graph Representation network with Adversarial feature Disentanglement (GRAD) for robust surgical workflow recognition in challenging scenarios with domain shifts or corrupted data. Specifically, we introduce a Multimodal Disentanglement Graph Network that captures fine-grained visual information while explicitly modeling the complex relationships between vision and kinematic embeddings through graph-based message modeling. To align feature spaces across modalities, we propose a Vision-Kinematic Adversarial framework that leverages adversarial training to reduce modality gaps and improve feature consistency. Furthermore, we design a Contextual Calibrated Decoder, incorporating temporal and contextual priors to enhance robustness against domain shifts and corrupted data. Extensive comparative and ablation experiments demonstrate the effectiveness of our model and proposed modules. Moreover, our robustness experiments show that our method effectively handles data corruption during storage and transmission, exhibiting excellent stability and robustness. Our approach aims to advance automated surgical workflow recognition, addressing the complexities and dynamism inherent in surgical procedures. △ Less

Submitted 3 May, 2025; originally announced May 2025.

Comments: Accepted by Information Fusion

arXiv:2505.01476 [pdf, other]

CostFilter-AD: Enhancing Anomaly Detection through Matching Cost Filtering

Authors: Zhe Zhang, Mingxiu Cai, Hanxiao Wang, Gaochang Wu, Tianyou Chai, Xiatian Zhu

Abstract: Unsupervised anomaly detection (UAD) seeks to localize the anomaly mask of an input image with respect to normal samples. Either by reconstructing normal counterparts (reconstruction-based) or by learning an image feature embedding space (embedding-based), existing approaches fundamentally rely on image-level or feature-level matching to derive anomaly scores. Often, such a matching process is ina… ▽ More Unsupervised anomaly detection (UAD) seeks to localize the anomaly mask of an input image with respect to normal samples. Either by reconstructing normal counterparts (reconstruction-based) or by learning an image feature embedding space (embedding-based), existing approaches fundamentally rely on image-level or feature-level matching to derive anomaly scores. Often, such a matching process is inaccurate yet overlooked, leading to sub-optimal detection. To address this issue, we introduce the concept of cost filtering, borrowed from classical matching tasks, such as depth and flow estimation, into the UAD problem. We call this approach {\em CostFilter-AD}. Specifically, we first construct a matching cost volume between the input and normal samples, comprising two spatial dimensions and one matching dimension that encodes potential matches. To refine this, we propose a cost volume filtering network, guided by the input observation as an attention query across multiple feature layers, which effectively suppresses matching noise while preserving edge structures and capturing subtle anomalies. Designed as a generic post-processing plug-in, CostFilter-AD can be integrated with either reconstruction-based or embedding-based methods. Extensive experiments on MVTec-AD and VisA benchmarks validate the generic benefits of CostFilter-AD for both single- and multi-class UAD tasks. Code and models will be released at https://github.com/ZHE-SAPI/CostFilter-AD. △ Less

Submitted 23 May, 2025; v1 submitted 2 May, 2025; originally announced May 2025.

Comments: 25 pages, 12 figures, 20 tables, accepted by Forty-Second International Conference on Machine Learning ( ICML 2025 ), link: https://icml.cc/virtual/2025/poster/46359

arXiv:2505.01273 [pdf, other]

Anti-adversarial Learning: Desensitizing Prompts for Large Language Models

Authors: Xuan Li, Zhe Yin, Xiaodong Gu, Beijun Shen

Abstract: With the widespread use of LLMs, preserving privacy in user prompts has become crucial, as prompts risk exposing privacy and sensitive data to the cloud LLMs. Traditional techniques like homomorphic encryption, secure multi-party computation, and federated learning face challenges due to heavy computational costs and user participation requirements, limiting their applicability in LLM scenarios. I… ▽ More With the widespread use of LLMs, preserving privacy in user prompts has become crucial, as prompts risk exposing privacy and sensitive data to the cloud LLMs. Traditional techniques like homomorphic encryption, secure multi-party computation, and federated learning face challenges due to heavy computational costs and user participation requirements, limiting their applicability in LLM scenarios. In this paper, we propose PromptObfus, a novel method for desensitizing LLM prompts. The core idea of PromptObfus is "anti-adversarial" learning, which perturbs privacy words in the prompt to obscure sensitive information while retaining the stability of model predictions. Specifically, PromptObfus frames prompt desensitization as a masked language modeling task, replacing privacy-sensitive terms with a [MASK] token. A desensitization model is trained to generate candidate replacements for each masked position. These candidates are subsequently selected based on gradient feedback from a surrogate model, ensuring minimal disruption to the task output. We demonstrate the effectiveness of our approach on three NLP tasks. Results show that PromptObfus effectively prevents privacy inference from remote LLMs while preserving task performance. △ Less

Submitted 25 April, 2025; originally announced May 2025.

arXiv:2505.00938 [pdf, other]

CDFormer: Cross-Domain Few-Shot Object Detection Transformer Against Feature Confusion

Authors: Boyuan Meng, Xiaohan Zhang, Peilin Li, Zhe Wu, Yiming Li, Wenkai Zhao, Beinan Yu, Hui-Liang Shen

Abstract: Cross-domain few-shot object detection (CD-FSOD) aims to detect novel objects across different domains with limited class instances. Feature confusion, including object-background confusion and object-object confusion, presents significant challenges in both cross-domain and few-shot settings. In this work, we introduce CDFormer, a cross-domain few-shot object detection transformer against feature… ▽ More Cross-domain few-shot object detection (CD-FSOD) aims to detect novel objects across different domains with limited class instances. Feature confusion, including object-background confusion and object-object confusion, presents significant challenges in both cross-domain and few-shot settings. In this work, we introduce CDFormer, a cross-domain few-shot object detection transformer against feature confusion, to address these challenges. The method specifically tackles feature confusion through two key modules: object-background distinguishing (OBD) and object-object distinguishing (OOD). The OBD module leverages a learnable background token to differentiate between objects and background, while the OOD module enhances the distinction between objects of different classes. Experimental results demonstrate that CDFormer outperforms previous state-of-the-art approaches, achieving 12.9% mAP, 11.0% mAP, and 10.4% mAP improvements under the 1/5/10 shot settings, respectively, when fine-tuned. △ Less

Submitted 1 May, 2025; originally announced May 2025.

arXiv:2504.21801 [pdf, other]

DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition

Authors: Z. Z. Ren, Zhihong Shao, Junxiao Song, Huajian Xin, Haocheng Wang, Wanjia Zhao, Liyue Zhang, Zhe Fu, Qihao Zhu, Dejian Yang, Z. F. Wu, Zhibin Gou, Shirong Ma, Hongxuan Tang, Yuxuan Liu, Wenjun Gao, Daya Guo, Chong Ruan

Abstract: We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a ch… ▽ More We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving in Lean 4, with initialization data collected through a recursive theorem proving pipeline powered by DeepSeek-V3. The cold-start training procedure begins by prompting DeepSeek-V3 to decompose complex problems into a series of subgoals. The proofs of resolved subgoals are synthesized into a chain-of-thought process, combined with DeepSeek-V3's step-by-step reasoning, to create an initial cold start for reinforcement learning. This process enables us to integrate both informal and formal mathematical reasoning into a unified model. The resulting model, DeepSeek-Prover-V2-671B, achieves state-of-the-art performance in neural theorem proving, reaching 88.9% pass ratio on the MiniF2F-test and solving 49 out of 658 problems from PutnamBench. In addition to standard benchmarks, we introduce ProverBench, a collection of 325 formalized problems, to enrich our evaluation, including 15 selected problems from the recent AIME competitions (years 24-25). Further evaluation on these 15 AIME problems shows that the model successfully solves 6 of them. In comparison, DeepSeek-V3 solves 8 of these problems using majority voting, highlighting that the gap between formal and informal mathematical reasoning in large language models is substantially narrowing. △ Less

Submitted 30 April, 2025; originally announced April 2025.

arXiv:2504.21296 [pdf, other]

Fairness in Graph Learning Augmented with Machine Learning: A Survey

Authors: Renqiang Luo, Ziqi Xu, Xikun Zhang, Qing Qing, Huafei Huang, Enyan Dai, Zhe Wang, Bo Yang

Abstract: Augmenting specialised machine learning techniques into traditional graph learning models has achieved notable success across various domains, including federated graph learning, dynamic graph learning, and graph transformers. However, the intricate mechanisms of these specialised techniques introduce significant challenges in maintaining model fairness, potentially resulting in discriminatory out… ▽ More Augmenting specialised machine learning techniques into traditional graph learning models has achieved notable success across various domains, including federated graph learning, dynamic graph learning, and graph transformers. However, the intricate mechanisms of these specialised techniques introduce significant challenges in maintaining model fairness, potentially resulting in discriminatory outcomes in high-stakes applications such as recommendation systems, disaster response, criminal justice, and loan approval. This paper systematically examines the unique fairness challenges posed by Graph Learning augmented with Machine Learning (GL-ML). It highlights the complex interplay between graph learning mechanisms and machine learning techniques, emphasising how the augmentation of machine learning both enhances and complicates fairness. Additionally, we explore four critical techniques frequently employed to improve fairness in GL-ML methods. By thoroughly investigating the root causes and broader implications of fairness challenges in this rapidly evolving field, this work establishes a robust foundation for future research and innovation in GL-ML fairness. △ Less

Submitted 30 April, 2025; originally announced April 2025.

arXiv:2504.21054 [pdf, other]

FFCBA: Feature-based Full-target Clean-label Backdoor Attacks

Authors: Yangxu Yin, Honglong Chen, Yudong Gao, Peng Sun, Liantao Wu, Zhe Li, Weifeng Liu

Abstract: Backdoor attacks pose a significant threat to deep neural networks, as backdoored models would misclassify poisoned samples with specific triggers into target classes while maintaining normal performance on clean samples. Among these, multi-target backdoor attacks can simultaneously target multiple classes. However, existing multi-target backdoor attacks all follow the dirty-label paradigm, where… ▽ More Backdoor attacks pose a significant threat to deep neural networks, as backdoored models would misclassify poisoned samples with specific triggers into target classes while maintaining normal performance on clean samples. Among these, multi-target backdoor attacks can simultaneously target multiple classes. However, existing multi-target backdoor attacks all follow the dirty-label paradigm, where poisoned samples are mislabeled, and most of them require an extremely high poisoning rate. This makes them easily detectable by manual inspection. In contrast, clean-label attacks are more stealthy, as they avoid modifying the labels of poisoned samples. However, they generally struggle to achieve stable and satisfactory attack performance and often fail to scale effectively to multi-target attacks. To address this issue, we propose the Feature-based Full-target Clean-label Backdoor Attacks (FFCBA) which consists of two paradigms: Feature-Spanning Backdoor Attacks (FSBA) and Feature-Migrating Backdoor Attacks (FMBA). FSBA leverages class-conditional autoencoders to generate noise triggers that align perturbed in-class samples with the original category's features, ensuring the effectiveness, intra-class consistency, inter-class specificity and natural-feature correlation of triggers. While FSBA supports swift and efficient attacks, its cross-model attack capability is relatively weak. FMBA employs a two-stage class-conditional autoencoder training process that alternates between using out-of-class samples and in-class samples. This allows FMBA to generate triggers with strong target-class features, making it highly effective for cross-model attacks. We conduct experiments on multiple datasets and models, the results show that FFCBA achieves outstanding attack performance and maintains desirable robustness against the state-of-the-art backdoor defenses. △ Less

Submitted 29 April, 2025; originally announced April 2025.

arXiv:2504.20820 [pdf]

Experimental Observation of Extremely Strong Defect-Phonon Scatterings in Semiconductor Single Crystals

Authors: Zifeng Huang, Jianbo Liang, Yuxiang Wang, Zixuan Sun, Naoteru Shigekawa, Ming Li, Runsheng Wang, Zhe Cheng

Abstract: The role of doping in tailoring thermal transport in semiconductors is critical for efficient thermal management in electronic devices. While the effects of doping have been extensively studied to tune electrical properties, its impact on thermal transport has not yet been thoroughly explored, particularly with respect to experimental investigations into exceptionally strong non-Rayleigh defect-ph… ▽ More The role of doping in tailoring thermal transport in semiconductors is critical for efficient thermal management in electronic devices. While the effects of doping have been extensively studied to tune electrical properties, its impact on thermal transport has not yet been thoroughly explored, particularly with respect to experimental investigations into exceptionally strong non-Rayleigh defect-phonon scattering phenomena. Herein, by combining the high-quality growth and advanced characterizations of cubic silicon carbide single crystals with well controlled boron doping, we experimentally observe anomalous strong defect-phonon scatterings, among the strongest reported in common semiconductors, that exceeds the predictions of the classic mass difference model by tens of times in magnitude. The measured thermal conductivity of doped 3C SiC match excellently with those predicted by first principle calculations in which resonant scattering of low frequency phonon is considered. Our findings not only shed light on the fundamental understanding of defect-phonon interactions and will also impact applications such as thermal management of electronics. △ Less

Submitted 29 April, 2025; originally announced April 2025.

arXiv:2504.20624 [pdf, other]

PaRT: Enhancing Proactive Social Chatbots with Personalized Real-Time Retrieval

Authors: Zihan Niu, Zheyong Xie, Shaosheng Cao, Chonggang Lu, Zheyu Ye, Tong Xu, Zuozhu Liu, Yan Gao, Jia Chen, Zhe Xu, Yi Wu, Yao Hu

Abstract: Social chatbots have become essential intelligent companions in daily scenarios ranging from emotional support to personal interaction. However, conventional chatbots with passive response mechanisms usually rely on users to initiate or sustain dialogues by bringing up new topics, resulting in diminished engagement and shortened dialogue duration. In this paper, we present PaRT, a novel framework… ▽ More Social chatbots have become essential intelligent companions in daily scenarios ranging from emotional support to personal interaction. However, conventional chatbots with passive response mechanisms usually rely on users to initiate or sustain dialogues by bringing up new topics, resulting in diminished engagement and shortened dialogue duration. In this paper, we present PaRT, a novel framework enabling context-aware proactive dialogues for social chatbots through personalized real-time retrieval and generation. Specifically, PaRT first integrates user profiles and dialogue context into a large language model (LLM), which is initially prompted to refine user queries and recognize their underlying intents for the upcoming conversation. Guided by refined intents, the LLM generates personalized dialogue topics, which then serve as targeted queries to retrieve relevant passages from RedNote. Finally, we prompt LLMs with summarized passages to generate knowledge-grounded and engagement-optimized responses. Our approach has been running stably in a real-world production environment for more than 30 days, achieving a 21.77\% improvement in the average duration of dialogues. △ Less

Submitted 29 April, 2025; originally announced April 2025.

arXiv:2504.20193 [pdf, other]

ProFi-Net: Prototype-based Feature Attention with Curriculum Augmentation for WiFi-based Gesture Recognition

Authors: Zhe Cui, Shuxian Zhang, Kangzhi Lou, Le-Nam Tran

Abstract: This paper presents ProFi-Net, a novel few-shot learning framework for WiFi-based gesture recognition that overcomes the challenges of limited training data and sparse feature representations. ProFi-Net employs a prototype-based metric learning architecture enhanced with a feature-level attention mechanism, which dynamically refines the Euclidean distance by emphasizing the most discriminative fea… ▽ More This paper presents ProFi-Net, a novel few-shot learning framework for WiFi-based gesture recognition that overcomes the challenges of limited training data and sparse feature representations. ProFi-Net employs a prototype-based metric learning architecture enhanced with a feature-level attention mechanism, which dynamically refines the Euclidean distance by emphasizing the most discriminative feature dimensions. Additionally, our approach introduces a curriculum-inspired data augmentation strategy exclusively on the query set. By progressively incorporating Gaussian noise of increasing magnitude, the model is exposed to a broader range of challenging variations, thereby improving its generalization and robustness to overfitting. Extensive experiments conducted across diverse real-world environments demonstrate that ProFi-Net significantly outperforms conventional prototype networks and other state-of-the-art few-shot learning methods in terms of classification accuracy and training efficiency. △ Less

Submitted 28 April, 2025; originally announced April 2025.

Comments: This paper was accepted at The 9th APWeb-WAIM joint International Conference on Web and Big Data

arXiv:2504.20178 [pdf, other]

A Transformer-based Multimodal Fusion Model for Efficient Crowd Counting Using Visual and Wireless Signals

Authors: Zhe Cui, Yuli Li, Le-Nam Tran

Abstract: Current crowd-counting models often rely on single-modal inputs, such as visual images or wireless signal data, which can result in significant information loss and suboptimal recognition performance. To address these shortcomings, we propose TransFusion, a novel multimodal fusion-based crowd-counting model that integrates Channel State Information (CSI) with image data. By leveraging the powerful… ▽ More Current crowd-counting models often rely on single-modal inputs, such as visual images or wireless signal data, which can result in significant information loss and suboptimal recognition performance. To address these shortcomings, we propose TransFusion, a novel multimodal fusion-based crowd-counting model that integrates Channel State Information (CSI) with image data. By leveraging the powerful capabilities of Transformer networks, TransFusion effectively combines these two distinct data modalities, enabling the capture of comprehensive global contextual information that is critical for accurate crowd estimation. However, while transformers are well capable of capturing global features, they potentially fail to identify finer-grained, local details essential for precise crowd counting. To mitigate this, we incorporate Convolutional Neural Networks (CNNs) into the model architecture, enhancing its ability to extract detailed local features that complement the global context provided by the Transformer. Extensive experimental evaluations demonstrate that TransFusion achieves high accuracy with minimal counting errors while maintaining superior efficiency. △ Less

Submitted 28 April, 2025; originally announced April 2025.

Comments: This paper was accepted at IEEE WCNC 2025

arXiv:2504.19959 [pdf, ps, other]

From Concept to Practice: an Automated LLM-aided UVM Machine for RTL Verification

Authors: Junhao Ye, Yuchen Hu, Ke Xu, Dingrong Pan, Qichun Chen, Jie Zhou, Shuai Zhao, Xinwei Fang, Xi Wang, Nan Guan, Zhe Jiang

Abstract: Verification presents a major bottleneck in Integrated Circuit (IC) development, consuming nearly 70% of the total development effort. While the Universal Verification Methodology (UVM) is widely used in industry to improve verification efficiency through structured and reusable testbenches, constructing these testbenches and generating sufficient stimuli remain challenging. These challenges arise… ▽ More Verification presents a major bottleneck in Integrated Circuit (IC) development, consuming nearly 70% of the total development effort. While the Universal Verification Methodology (UVM) is widely used in industry to improve verification efficiency through structured and reusable testbenches, constructing these testbenches and generating sufficient stimuli remain challenging. These challenges arise from the considerable manual coding effort required, repetitive manual execution of multiple EDA tools, and the need for in-depth domain expertise to navigate complex designs.Here, we present UVM^2, an automated verification framework that leverages Large Language Models (LLMs) to generate UVM testbenches and iteratively refine them using coverage feedback, significantly reducing manual effort while maintaining rigorous verification standards.To evaluate UVM^2, we introduce a benchmark suite comprising Register Transfer Level (RTL) designs of up to 1.6K lines of code.The results show that UVM^2 reduces testbench setup time by up to UVM^2 compared to experienced engineers, and achieve average code and function coverage of 87.44% and 89.58%, outperforming state-of-the-art solutions by 20.96% and 23.51%, respectively. △ Less

Submitted 28 April, 2025; v1 submitted 28 April, 2025; originally announced April 2025.

arXiv:2504.19432 [pdf, other]

EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation

Authors: Zhe Dong, Yuzhe Sun, Tianzhu Liu, Wangmeng Zuo, Yanfeng Gu

Abstract: Satellite imagery and maps, as two fundamental data modalities in remote sensing, offer direct observations of the Earth's surface and human-interpretable geographic abstractions, respectively. The task of bidirectional translation between satellite images and maps (BSMT) holds significant potential for applications in urban planning and disaster response. However, this task presents two major cha… ▽ More Satellite imagery and maps, as two fundamental data modalities in remote sensing, offer direct observations of the Earth's surface and human-interpretable geographic abstractions, respectively. The task of bidirectional translation between satellite images and maps (BSMT) holds significant potential for applications in urban planning and disaster response. However, this task presents two major challenges: first, the absence of precise pixel-wise alignment between the two modalities substantially complicates the translation process; second, it requires achieving both high-level abstraction of geographic features and high-quality visual synthesis, which further elevates the technical complexity. To address these limitations, we introduce EarthMapper, a novel autoregressive framework for controllable bidirectional satellite-map translation. EarthMapper employs geographic coordinate embeddings to anchor generation, ensuring region-specific adaptability, and leverages multi-scale feature alignment within a geo-conditioned joint scale autoregression (GJSA) process to unify bidirectional translation in a single training cycle. A semantic infusion (SI) mechanism is introduced to enhance feature-level consistency, while a key point adaptive guidance (KPAG) mechanism is proposed to dynamically balance diversity and precision during inference. We further contribute CNSatMap, a large-scale dataset comprising 302,132 precisely aligned satellite-map pairs across 38 Chinese cities, enabling robust benchmarking. Extensive experiments on CNSatMap and the New York dataset demonstrate EarthMapper's superior performance, achieving significant improvements in visual realism, semantic consistency, and structural fidelity over state-of-the-art methods. Additionally, EarthMapper excels in zero-shot tasks like in-painting, out-painting and coordinate-conditional generation, underscoring its versatility. △ Less

Submitted 27 April, 2025; originally announced April 2025.

arXiv:2504.19099 [pdf, other]

VeriDebug: A Unified LLM for Verilog Debugging via Contrastive Embedding and Guided Correction

Authors: Ning Wang, Bingkun Yao, Jie Zhou, Yuchen Hu, Xi Wang, Nan Guan, Zhe Jiang

Abstract: Large Language Models (LLMs) have demonstrated remarkable potential in debugging for various programming languages. However, the application of LLMs to Verilog debugging remains insufficiently explored. Here, we present VeriDebug, an approach that integrates contrastive representation and guided correction capabilities for automated Verilog debugging. Unlike existing methods, VeriDebug employs an… ▽ More Large Language Models (LLMs) have demonstrated remarkable potential in debugging for various programming languages. However, the application of LLMs to Verilog debugging remains insufficiently explored. Here, we present VeriDebug, an approach that integrates contrastive representation and guided correction capabilities for automated Verilog debugging. Unlike existing methods, VeriDebug employs an embedding-based technique to accurately retrieve internal information, followed by bug-fixing. VeriDebug unifies Verilog bug detection and correction through a shared parameter space. By simultaneously learning bug patterns and fixes, it streamlines debugging via contrastive embedding and guided correction. Empirical results show the efficacy of VeriDebug in enhancing Verilog debugging. Our VeriDebugLoc, Type model achieves 64.7 accuracy in bug fixing (Acc1), a significant improvement from the existing open-source SOTAs 11.3. This performance not only outperforms open-source alternatives but also exceeds larger closed-source models like GPT-3.5-turbo (36.6), offering a more accurate alternative to conventional debugging methods. △ Less

Submitted 27 April, 2025; originally announced April 2025.

arXiv:2504.18881 [pdf, other]

TSCAN: Context-Aware Uplift Modeling via Two-Stage Training for Online Merchant Business Diagnosis

Authors: Hangtao Zhang, Zhe Li, Kairui Zhang

Abstract: A primary challenge in ITE estimation is sample selection bias. Traditional approaches utilize treatment regularization techniques such as the Integral Probability Metrics (IPM), re-weighting, and propensity score modeling to mitigate this bias. However, these regularizations may introduce undesirable information loss and limit the performance of the model. Furthermore, treatment effects vary acro… ▽ More A primary challenge in ITE estimation is sample selection bias. Traditional approaches utilize treatment regularization techniques such as the Integral Probability Metrics (IPM), re-weighting, and propensity score modeling to mitigate this bias. However, these regularizations may introduce undesirable information loss and limit the performance of the model. Furthermore, treatment effects vary across different external contexts, and the existing methods are insufficient in fully interacting with and utilizing these contextual features. To address these issues, we propose a Context-Aware uplift model based on the Two-Stage training approach (TSCAN), comprising CAN-U and CAN-D sub-models. In the first stage, we train an uplift model, called CAN-U, which includes the treatment regularizations of IPM and propensity score prediction, to generate a complete dataset with counterfactual uplift labels. In the second stage, we train a model named CAN-D, which utilizes an isotonic output layer to directly model uplift effects, thereby eliminating the reliance on the regularization components. CAN-D adaptively corrects the errors estimated by CAN-U through reinforcing the factual samples, while avoiding the negative impacts associated with the aforementioned regularizations. Additionally, we introduce a Context-Aware Attention Layer throughout the two-stage process to manage the interactions between treatment, merchant, and contextual features, thereby modeling the varying treatment effect in different contexts. We conduct extensive experiments on two real-world datasets to validate the effectiveness of TSCAN. Ultimately, the deployment of our model for real-world merchant diagnosis on one of China's largest online food ordering platforms validates its practical utility and impact. △ Less

Submitted 26 April, 2025; originally announced April 2025.

Comments: 15 pages,7 figures

arXiv:2504.17187 [pdf, other]

DualAttWaveNet: Multiscale Attention Networks for Satellite Interference Detection

Authors: Chunyu Yang, Boyu Yang, Kun Qiu, Zhe Chen, Yue Gao

Abstract: The escalating overlap between non-geostationary orbit (NGSO) and geostationary orbit (GSO) satellite frequency allocations necessitates accurate interference detection methods that address two pivotal technical gaps: computationally efficient signal analysis for real-time operation, and robust anomaly discrimination under varying interference patterns. Existing deep learning approaches employ enc… ▽ More The escalating overlap between non-geostationary orbit (NGSO) and geostationary orbit (GSO) satellite frequency allocations necessitates accurate interference detection methods that address two pivotal technical gaps: computationally efficient signal analysis for real-time operation, and robust anomaly discrimination under varying interference patterns. Existing deep learning approaches employ encoder-decoder anomaly detectors that threshold input-output discrepancies for robustness. While the transformer-based TrID model achieves state-of-the-art performance (AUC: 0.8318, F1: 0.8321), its multi-head attention incurs prohibitive computation time, and its decoupled training of time-frequency models overlooks cross-domain dependencies. To overcome these problems, we propose DualAttWaveNet. A bidirectional attention fusion layer dynamically correlates time-domain samples using parameter-efficient cross-attention routing. A wavelet-regularized reconstruction loss enforces multi-scale consistency. We train the model on public dataset which consists of 48 hours of satellite signals. Experiments show that compared to TrID, DualAttWaveNet improves AUC by 12% and reduces inference time by 50% to 540ms per batch while maintaining F1-score. △ Less

Submitted 23 April, 2025; originally announced April 2025.

arXiv:2504.16122 [pdf, other]

SOTOPIA-S4: a user-friendly system for flexible, customizable, and large-scale social simulation

Authors: Xuhui Zhou, Zhe Su, Sophie Feng, Jiaxu Zhou, Jen-tse Huang, Hsien-Te Kao, Spencer Lynch, Svitlana Volkova, Tongshuang Sherry Wu, Anita Woolley, Hao Zhu, Maarten Sap

Abstract: Social simulation through large language model (LLM) agents is a promising approach to explore and validate hypotheses related to social science questions and LLM agents behavior. We present SOTOPIA-S4, a fast, flexible, and scalable social simulation system that addresses the technical barriers of current frameworks while enabling practitioners to generate multi-turn and multi-party LLM-based int… ▽ More Social simulation through large language model (LLM) agents is a promising approach to explore and validate hypotheses related to social science questions and LLM agents behavior. We present SOTOPIA-S4, a fast, flexible, and scalable social simulation system that addresses the technical barriers of current frameworks while enabling practitioners to generate multi-turn and multi-party LLM-based interactions with customizable evaluation metrics for hypothesis testing. SOTOPIA-S4 comes as a pip package that contains a simulation engine, an API server with flexible RESTful APIs for simulation management, and a web interface that enables both technical and non-technical users to design, run, and analyze simulations without programming. We demonstrate the usefulness of SOTOPIA-S4 with two use cases involving dyadic hiring negotiation and multi-party planning scenarios. △ Less

Submitted 19 April, 2025; originally announced April 2025.

Comments: The first author and the second author contributed equally

arXiv:2504.15804 [pdf, other]

Insights from Verification: Training a Verilog Generation LLM with Reinforcement Learning with Testbench Feedback

Authors: Ning Wang, Bingkun Yao, Jie Zhou, Yuchen Hu, Xi Wang, Nan Guan, Zhe Jiang

Abstract: Large language models (LLMs) have shown strong performance in Verilog generation from natural language description. However, ensuring the functional correctness of the generated code remains a significant challenge. This paper introduces a method that integrates verification insights from testbench into the training of Verilog generation LLMs, aligning the training with the fundamental goal of har… ▽ More Large language models (LLMs) have shown strong performance in Verilog generation from natural language description. However, ensuring the functional correctness of the generated code remains a significant challenge. This paper introduces a method that integrates verification insights from testbench into the training of Verilog generation LLMs, aligning the training with the fundamental goal of hardware design: functional correctness. The main obstacle in using LLMs for Verilog code generation is the lack of sufficient functional verification data, particularly testbenches paired with design specifications and code. To address this problem, we introduce an automatic testbench generation pipeline that decomposes the process and uses feedback from the Verilog compiler simulator (VCS) to reduce hallucination and ensure correctness. We then use the testbench to evaluate the generated codes and collect them for further training, where verification insights are introduced. Our method applies reinforcement learning (RL), specifically direct preference optimization (DPO), to align Verilog code generation with functional correctness by training preference pairs based on testbench outcomes. In evaluations on VerilogEval-Machine, VerilogEval-Human, RTLLM v1.1, RTLLM v2, and VerilogEval v2, our approach consistently outperforms state-of-the-art baselines in generating functionally correct Verilog code. We open source all training code, data, and models at https://anonymous.4open.science/r/VeriPrefer-E88B. △ Less

Submitted 22 April, 2025; originally announced April 2025.

arXiv:2504.15722 [pdf, other]

From predictions to confidence intervals: an empirical study of conformal prediction methods for in-context learning

Authors: Zhe Huang, Simone Rossi, Rui Yuan, Thomas Hannagan

Abstract: Transformers have become a standard architecture in machine learning, demonstrating strong in-context learning (ICL) abilities that allow them to learn from the prompt at inference time. However, uncertainty quantification for ICL remains an open challenge, particularly in noisy regression tasks. This paper investigates whether ICL can be leveraged for distribution-free uncertainty estimation, pro… ▽ More Transformers have become a standard architecture in machine learning, demonstrating strong in-context learning (ICL) abilities that allow them to learn from the prompt at inference time. However, uncertainty quantification for ICL remains an open challenge, particularly in noisy regression tasks. This paper investigates whether ICL can be leveraged for distribution-free uncertainty estimation, proposing a method based on conformal prediction to construct prediction intervals with guaranteed coverage. While traditional conformal methods are computationally expensive due to repeated model fitting, we exploit ICL to efficiently generate confidence intervals in a single forward pass. Our empirical analysis compares this approach against ridge regression-based conformal methods, showing that conformal prediction with in-context learning (CP with ICL) achieves robust and scalable uncertainty estimates. Additionally, we evaluate its performance under distribution shifts and establish scaling laws to guide model training. These findings bridge ICL and conformal prediction, providing a theoretically grounded and new framework for uncertainty quantification in transformer-based models. △ Less

Submitted 22 April, 2025; originally announced April 2025.

arXiv:2504.15721 [pdf, other]

BBAL: A Bidirectional Block Floating Point-Based Quantisation Accelerator for Large Language Models

Authors: Xiaomeng Han, Yuan Cheng, Jing Wang, Junyang Lu, Hui Wang, X. x. Zhang, Ning Xu, Dawei Yang, Zhe Jiang

Abstract: Large language models (LLMs), with their billions of parameters, pose substantial challenges for deployment on edge devices, straining both memory capacity and computational resources. Block Floating Point (BFP) quantisation reduces memory and computational overhead by converting high-overhead floating point operations into low-bit fixed point operations. However, BFP requires aligning all data to… ▽ More Large language models (LLMs), with their billions of parameters, pose substantial challenges for deployment on edge devices, straining both memory capacity and computational resources. Block Floating Point (BFP) quantisation reduces memory and computational overhead by converting high-overhead floating point operations into low-bit fixed point operations. However, BFP requires aligning all data to the maximum exponent, which causes loss of small and moderate values, resulting in quantisation error and degradation in the accuracy of LLMs. To address this issue, we propose a Bidirectional Block Floating Point (BBFP) data format, which reduces the probability of selecting the maximum as shared exponent, thereby reducing quantisation error. By utilizing the features in BBFP, we present a full-stack Bidirectional Block Floating Point-Based Quantisation Accelerator for LLMs (BBAL), primarily comprising a processing element array based on BBFP, paired with proposed cost-effective nonlinear computation unit. Experimental results show BBAL achieves a 22% improvement in accuracy compared to an outlier-aware accelerator at similar efficiency, and a 40% efficiency improvement over a BFP-based accelerator at similar accuracy. △ Less

Submitted 22 April, 2025; originally announced April 2025.

arXiv:2504.15279 [pdf, other]

VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models

Authors: Weiye Xu, Jiahao Wang, Weiyun Wang, Zhe Chen, Wengang Zhou, Aijun Yang, Lewei Lu, Houqiang Li, Xiaohua Wang, Xizhou Zhu, Wenhai Wang, Jifeng Dai, Jinguo Zhu

Abstract: Visual reasoning is a core component of human intelligence and a critical capability for advanced multimodal models. Yet current reasoning evaluations of multimodal large language models (MLLMs) often rely on text descriptions and allow language-based reasoning shortcuts, failing to measure genuine vision-centric reasoning. To address this, we introduce VisuLogic: a benchmark of 1,000 human-verifi… ▽ More Visual reasoning is a core component of human intelligence and a critical capability for advanced multimodal models. Yet current reasoning evaluations of multimodal large language models (MLLMs) often rely on text descriptions and allow language-based reasoning shortcuts, failing to measure genuine vision-centric reasoning. To address this, we introduce VisuLogic: a benchmark of 1,000 human-verified problems across six categories (e.g., quantitative shifts, spatial relations, attribute comparisons). These various types of questions can be evaluated to assess the visual reasoning capabilities of MLLMs from multiple perspectives. We evaluate leading MLLMs on this benchmark and analyze their results to identify common failure modes. Most models score below 30% accuracy-only slightly above the 25% random baseline and far below the 51.4% achieved by humans-revealing significant gaps in visual reasoning. Furthermore, we provide a supplementary training dataset and a reinforcement-learning baseline to support further progress. △ Less

Submitted 21 April, 2025; originally announced April 2025.

Comments: Code, data, and baselines are available at https://visulogic-benchmark.github.io/VisuLogic

arXiv:2504.14352 [pdf, other]

Connectivity versus Lin-Lu-Yau curvature

Authors: Kaizhe Chen, Shiping Liu, Zhe You

Abstract: We explore the interaction between connectivity and Lin-Lu-Yau curvature of graphs systematically. The intuition is that connected graphs with large Lin-Lu-Yau curvature also have large connectivity, and vice versa. We prove that the connectivity of a connected graph is lower bounded by the product of its minimum degree and its Lin-Lu-Yau curvature. On the other hand, if the connectivity of a grap… ▽ More We explore the interaction between connectivity and Lin-Lu-Yau curvature of graphs systematically. The intuition is that connected graphs with large Lin-Lu-Yau curvature also have large connectivity, and vice versa. We prove that the connectivity of a connected graph is lower bounded by the product of its minimum degree and its Lin-Lu-Yau curvature. On the other hand, if the connectivity of a graph $G$ on $n$ vertices is at least $\frac{n-1}{2}$, then $G$ has positive Lin-Lu-Yau curvature. Moreover, the bound $\frac{n-1}{2}$ here is optimal. Furthermore, we prove that the edge-connectivity is equal to the minimum vertex degree for any connected graph with positive Lin-Lu-Yau curvature. As applications, we estimate or determine the connectivity and edge-connectivity of an amply regular graph with parameters $(d,α,β)$ such that $1\neq β\geq α$. △ Less

Submitted 19 April, 2025; originally announced April 2025.

Comments: 22 pages

arXiv:2504.14109 [pdf, other]

Time-varying treatment effect models in stepped-wedge cluster-randomized trials with multiple interventions

Authors: Zhe Chen, Wei Wang, Yingying Lu, Scott D. Halpern, Katherine R. Courtright, Fan Li, Michael O. Harhay

Abstract: The traditional model specification of stepped-wedge cluster-randomized trials assumes a homogeneous treatment effect across time while adjusting for fixed-time effects. However, when treatment effects vary over time, the constant effect estimator may be biased. In the general setting of stepped-wedge cluster-randomized trials with multiple interventions, we derive the expected value of the consta… ▽ More The traditional model specification of stepped-wedge cluster-randomized trials assumes a homogeneous treatment effect across time while adjusting for fixed-time effects. However, when treatment effects vary over time, the constant effect estimator may be biased. In the general setting of stepped-wedge cluster-randomized trials with multiple interventions, we derive the expected value of the constant effect estimator when the true treatment effects depend on exposure time periods. Applying this result to concurrent and factorial stepped wedge designs, we show that the estimator represents a weighted average of exposure-time-specific treatment effects, with weights that are not necessarily uniform across exposure periods. Extensive simulation studies reveal that ignoring time heterogeneity can result in biased estimates and poor coverage of the average treatment effect. In this study, we examine two models designed to accommodate multiple interventions with time-varying treatment effects: (1) a time-varying fixed treatment effect model, which allows treatment effects to vary by exposure time but remain fixed for each time point, and (2) a random treatment effect model, where the time-varying treatment effects are modeled as random deviations from an overall mean. In the simulations considered in this study, concurrent designs generally achieve higher power than factorial designs under a time-varying fixed treatment effect model, though the differences are modest. Finally, we apply the constant effect model and both time-varying treatment effect models to data from the Prognosticating Outcomes and Nudging Decisions in the Electronic Health Record (PONDER) trial. All three models indicate a lack of treatment effect for either intervention, though they differ in the precision of their estimates, likely due to variations in modeling assumptions. △ Less

Submitted 18 April, 2025; originally announced April 2025.

Comments: 22 pages

arXiv:2504.13914 [pdf, other]

Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning

Authors: ByteDance Seed, :, Jiaze Chen, Tiantian Fan, Xin Liu, Lingjun Liu, Zhiqi Lin, Mingxuan Wang, Chengyi Wang, Xiangpeng Wei, Wenyuan Xu, Yufeng Yuan, Yu Yue, Lin Yan, Qiying Yu, Xiaochen Zuo, Chi Zhang, Ruofei Zhu, Zhecheng An, Zhihao Bai, Yu Bao, Xingyan Bin, Jiangjie Chen, Feng Chen, Hongmin Chen , et al. (249 additional authors not shown)

Abstract: We introduce Seed1.5-Thinking, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks. Seed1.5-Thinking achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains. For in… ▽ More We introduce Seed1.5-Thinking, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks. Seed1.5-Thinking achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains. For instance, it surpasses DeepSeek R1 by 8% in win rate on non-reasoning tasks, indicating its broader applicability. Compared to other state-of-the-art reasoning models, Seed1.5-Thinking is a Mixture-of-Experts (MoE) model with a relatively small size, featuring 20B activated and 200B total parameters. As part of our effort to assess generalized reasoning, we develop two internal benchmarks, BeyondAIME and Codeforces, both of which will be publicly released to support future research. Model trial link: https://www.volcengine.com/experience/ark. △ Less

Submitted 29 April, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

arXiv:2504.13847 [pdf, other]

Interview AI-ssistant: Designing for Real-Time Human-AI Collaboration in Interview Preparation and Execution

Authors: Zhe Liu

Abstract: Recent advances in large language models (LLMs) offer unprecedented opportunities to enhance human-AI collaboration in qualitative research methods, including interviews. While interviews are highly valued for gathering deep, contextualized insights, interviewers often face significant cognitive challenges, such as real-time information processing, question adaptation, and rapport maintenance. My… ▽ More Recent advances in large language models (LLMs) offer unprecedented opportunities to enhance human-AI collaboration in qualitative research methods, including interviews. While interviews are highly valued for gathering deep, contextualized insights, interviewers often face significant cognitive challenges, such as real-time information processing, question adaptation, and rapport maintenance. My doctoral research introduces Interview AI-ssistant, a system designed for real-time interviewer-AI collaboration during both the preparation and execution phases. Through four interconnected studies, this research investigates the design of effective human-AI collaboration in interviewing contexts, beginning with a formative study of interviewers' needs, followed by a prototype development study focused on AI-assisted interview preparation, an experimental evaluation of real-time AI assistance during interviews, and a field study deploying the system in a real-world research setting. Beyond informing practical implementations of intelligent interview support systems, this work contributes to the Intelligent User Interfaces (IUI) community by advancing the understanding of human-AI collaborative interfaces in complex social tasks and establishing design guidelines for AI-enhanced qualitative research tools. △ Less

Submitted 3 March, 2025; originally announced April 2025.

Comments: 4 pages, 2 figures, submitted and accepted by IUI 2025 Doctoral Consortium

arXiv:2504.13807 [pdf, other]

DiffOG: Differentiable Policy Trajectory Optimization with Generalizability

Authors: Zhengtong Xu, Zichen Miao, Qiang Qiu, Zhe Zhang, Yu She

Abstract: Imitation learning-based visuomotor policies excel at manipulation tasks but often produce suboptimal action trajectories compared to model-based methods. Directly mapping camera data to actions via neural networks can result in jerky motions and difficulties in meeting critical constraints, compromising safety and robustness in real-world deployment. For tasks that require high robustness or stri… ▽ More Imitation learning-based visuomotor policies excel at manipulation tasks but often produce suboptimal action trajectories compared to model-based methods. Directly mapping camera data to actions via neural networks can result in jerky motions and difficulties in meeting critical constraints, compromising safety and robustness in real-world deployment. For tasks that require high robustness or strict adherence to constraints, ensuring trajectory quality is crucial. However, the lack of interpretability in neural networks makes it challenging to generate constraint-compliant actions in a controlled manner. This paper introduces differentiable policy trajectory optimization with generalizability (DiffOG), a learning-based trajectory optimization framework designed to enhance visuomotor policies. By leveraging the proposed differentiable formulation of trajectory optimization with transformer, DiffOG seamlessly integrates policies with a generalizable optimization layer. DiffOG refines action trajectories to be smoother and more constraint-compliant while maintaining alignment with the original demonstration distribution, thus avoiding degradation in policy performance. We evaluated DiffOG across 11 simulated tasks and 2 real-world tasks. The results demonstrate that DiffOG significantly enhances the trajectory quality of visuomotor policies while having minimal impact on policy performance, outperforming trajectory processing baselines such as greedy constraint clipping and penalty-based trajectory optimization. Furthermore, DiffOG achieves superior performance compared to existing constrained visuomotor policy. Please visit the project website for more details: https://zhengtongxu.github.io/diffog-website/. △ Less

Submitted 13 May, 2025; v1 submitted 18 April, 2025; originally announced April 2025.

arXiv:2504.13479 [pdf, other]

SFL-LEO: Asynchronous Split-Federated Learning Design for LEO Satellite-Ground Network Framework

Authors: Jiasheng Wu, Jingjing Zhang, Zheng Lin, Zhe Chen, Xiong Wang, Wenjun Zhu, Yue Gao

Abstract: Recently, the rapid development of LEO satellite networks spurs another widespread concern-data processing at satellites. However, achieving efficient computation at LEO satellites in highly dynamic satellite networks is challenging and remains an open problem when considering the constrained computation capability of LEO satellites. For the first time, we propose a novel distributed learning fram… ▽ More Recently, the rapid development of LEO satellite networks spurs another widespread concern-data processing at satellites. However, achieving efficient computation at LEO satellites in highly dynamic satellite networks is challenging and remains an open problem when considering the constrained computation capability of LEO satellites. For the first time, we propose a novel distributed learning framework named SFL-LEO by combining Federated Learning (FL) with Split Learning (SL) to accommodate the high dynamics of LEO satellite networks and the constrained computation capability of LEO satellites by leveraging the periodical orbit traveling feature. The proposed scheme allows training locally by introducing an asynchronous training strategy, i.e., achieving local update when LEO satellites disconnect with the ground station, to provide much more training space and thus increase the training performance. Meanwhile, it aggregates client-side sub-models at the ground station and then distributes them to LEO satellites by borrowing the idea from the federated learning scheme. Experiment results driven by satellite-ground bandwidth measured in Starlink demonstrate that SFL-LEO provides a similar accuracy performance with the conventional SL scheme because it can perform local training even within the disconnection duration. △ Less

Submitted 18 April, 2025; originally announced April 2025.

Comments: 13 pages, 14 figures

arXiv:2504.13207 [pdf, other]

BEV-GS: Feed-forward Gaussian Splatting in Bird's-Eye-View for Road Reconstruction

Authors: Wenhua Wu, Tong Zhao, Chensheng Peng, Lei Yang, Yintao Wei, Zhe Liu, Hesheng Wang

Abstract: Road surface is the sole contact medium for wheels or robot feet. Reconstructing road surface is crucial for unmanned vehicles and mobile robots. Recent studies on Neural Radiance Fields (NeRF) and Gaussian Splatting (GS) have achieved remarkable results in scene reconstruction. However, they typically rely on multi-view image inputs and require prolonged optimization times. In this paper, we prop… ▽ More Road surface is the sole contact medium for wheels or robot feet. Reconstructing road surface is crucial for unmanned vehicles and mobile robots. Recent studies on Neural Radiance Fields (NeRF) and Gaussian Splatting (GS) have achieved remarkable results in scene reconstruction. However, they typically rely on multi-view image inputs and require prolonged optimization times. In this paper, we propose BEV-GS, a real-time single-frame road surface reconstruction method based on feed-forward Gaussian splatting. BEV-GS consists of a prediction module and a rendering module. The prediction module introduces separate geometry and texture networks following Bird's-Eye-View paradigm. Geometric and texture parameters are directly estimated from a single frame, avoiding per-scene optimization. In the rendering module, we utilize grid Gaussian for road surface representation and novel view synthesis, which better aligns with road surface characteristics. Our method achieves state-of-the-art performance on the real-world dataset RSRD. The road elevation error reduces to 1.73 cm, and the PSNR of novel view synthesis reaches 28.36 dB. The prediction and rendering FPS is 26, and 2061, respectively, enabling high-accuracy and real-time applications. The code will be available at: \href{https://github.com/cat-wwh/BEV-GS}{\texttt{https://github.com/cat-wwh/BEV-GS}} △ Less

Submitted 15 April, 2025; originally announced April 2025.

arXiv:2504.12636 [pdf, other]

A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation

Authors: Rongtao Xu, Jian Zhang, Minghao Guo, Youpeng Wen, Haoting Yang, Min Lin, Jianzheng Huang, Zhe Li, Kaidong Zhang, Liqiong Wang, Yuxuan Kuang, Meng Cao, Feng Zheng, Xiaodan Liang

Abstract: Robotic manipulation faces critical challenges in understanding spatial affordances--the "where" and "how" of object interactions--essential for complex manipulation tasks like wiping a board or stacking objects. Existing methods, including modular-based and end-to-end approaches, often lack robust spatial reasoning capabilities. Unlike recent point-based and flow-based affordance methods that foc… ▽ More Robotic manipulation faces critical challenges in understanding spatial affordances--the "where" and "how" of object interactions--essential for complex manipulation tasks like wiping a board or stacking objects. Existing methods, including modular-based and end-to-end approaches, often lack robust spatial reasoning capabilities. Unlike recent point-based and flow-based affordance methods that focus on dense spatial representations or trajectory modeling, we propose A0, a hierarchical affordance-aware diffusion model that decomposes manipulation tasks into high-level spatial affordance understanding and low-level action execution. A0 leverages the Embodiment-Agnostic Affordance Representation, which captures object-centric spatial affordances by predicting contact points and post-contact trajectories. A0 is pre-trained on 1 million contact points data and fine-tuned on annotated trajectories, enabling generalization across platforms. Key components include Position Offset Attention for motion-aware feature extraction and a Spatial Information Aggregation Layer for precise coordinate mapping. The model's output is executed by the action execution module. Experiments on multiple robotic systems (Franka, Kinova, Realman, and Dobot) demonstrate A0's superior performance in complex tasks, showcasing its efficiency, flexibility, and real-world applicability. △ Less

Submitted 6 May, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

arXiv:2504.12292 [pdf, ps, other]

SHeaP: Self-Supervised Head Geometry Predictor Learned via 2D Gaussians

Authors: Liam Schoneveld, Zhe Chen, Davide Davoli, Jiapeng Tang, Saimon Terazawa, Ko Nishino, Matthias Nießner

Abstract: Accurate, real-time 3D reconstruction of human heads from monocular images and videos underlies numerous visual applications. As 3D ground truth data is hard to come by at scale, previous methods have sought to learn from abundant 2D videos in a self-supervised manner. Typically, this involves the use of differentiable mesh rendering, which is effective but faces limitations. To improve on this, w… ▽ More Accurate, real-time 3D reconstruction of human heads from monocular images and videos underlies numerous visual applications. As 3D ground truth data is hard to come by at scale, previous methods have sought to learn from abundant 2D videos in a self-supervised manner. Typically, this involves the use of differentiable mesh rendering, which is effective but faces limitations. To improve on this, we propose SHeaP (Self-supervised Head Geometry Predictor Learned via 2D Gaussians). Given a source image, we predict a 3DMM mesh and a set of Gaussians that are rigged to this mesh. We then reanimate this rigged head avatar to match a target frame, and backpropagate photometric losses to both the 3DMM and Gaussian prediction networks. We find that using Gaussians for rendering substantially improves the effectiveness of this self-supervised approach. Training solely on 2D data, our method surpasses existing self-supervised approaches in geometric evaluations on the NoW benchmark for neutral faces and a new benchmark for non-neutral expressions. Our method also produces highly expressive meshes, outperforming state-of-the-art in emotion classification. △ Less

Submitted 16 April, 2025; originally announced April 2025.

Comments: For video demonstrations and additional materials please see https://nlml.github.io/sheap/

arXiv:2504.11845 [pdf, other]

Boosting Multi-View Stereo with Depth Foundation Model in the Absence of Real-World Labels

Authors: Jie Zhu, Bo Peng, Zhe Zhang, Bingzheng Liu, Jianjun Lei

Abstract: Learning-based Multi-View Stereo (MVS) methods have made remarkable progress in recent years. However, how to effectively train the network without using real-world labels remains a challenging problem. In this paper, driven by the recent advancements of vision foundation models, a novel method termed DFM-MVS, is proposed to leverage the depth foundation model to generate the effective depth prior… ▽ More Learning-based Multi-View Stereo (MVS) methods have made remarkable progress in recent years. However, how to effectively train the network without using real-world labels remains a challenging problem. In this paper, driven by the recent advancements of vision foundation models, a novel method termed DFM-MVS, is proposed to leverage the depth foundation model to generate the effective depth prior, so as to boost MVS in the absence of real-world labels. Specifically, a depth prior-based pseudo-supervised training mechanism is developed to simulate realistic stereo correspondences using the generated depth prior, thereby constructing effective supervision for the MVS network. Besides, a depth prior-guided error correction strategy is presented to leverage the depth prior as guidance to mitigate the error propagation problem inherent in the widely-used coarse-to-fine network structure. Experimental results on DTU and Tanks & Temples datasets demonstrate that the proposed DFM-MVS significantly outperforms existing MVS methods without using real-world labels. △ Less

Submitted 16 April, 2025; originally announced April 2025.

arXiv:2504.11773 [pdf, other]

TacoDepth: Towards Efficient Radar-Camera Depth Estimation with One-stage Fusion

Authors: Yiran Wang, Jiaqi Li, Chaoyi Hong, Ruibo Li, Liusheng Sun, Xiao Song, Zhe Wang, Zhiguo Cao, Guosheng Lin

Abstract: Radar-Camera depth estimation aims to predict dense and accurate metric depth by fusing input images and Radar data. Model efficiency is crucial for this task in pursuit of real-time processing on autonomous vehicles and robotic platforms. However, due to the sparsity of Radar returns, the prevailing methods adopt multi-stage frameworks with intermediate quasi-dense depth, which are time-consuming… ▽ More Radar-Camera depth estimation aims to predict dense and accurate metric depth by fusing input images and Radar data. Model efficiency is crucial for this task in pursuit of real-time processing on autonomous vehicles and robotic platforms. However, due to the sparsity of Radar returns, the prevailing methods adopt multi-stage frameworks with intermediate quasi-dense depth, which are time-consuming and not robust. To address these challenges, we propose TacoDepth, an efficient and accurate Radar-Camera depth estimation model with one-stage fusion. Specifically, the graph-based Radar structure extractor and the pyramid-based Radar fusion module are designed to capture and integrate the graph structures of Radar point clouds, delivering superior model efficiency and robustness without relying on the intermediate depth results. Moreover, TacoDepth can be flexible for different inference modes, providing a better balance of speed and accuracy. Extensive experiments are conducted to demonstrate the efficacy of our method. Compared with the previous state-of-the-art approach, TacoDepth improves depth accuracy and processing speed by 12.8% and 91.8%. Our work provides a new perspective on efficient Radar-Camera depth estimation. △ Less

Submitted 16 April, 2025; originally announced April 2025.

Comments: Accepted by CVPR 2025 (Oral Presentation)

arXiv:2504.11702 [pdf, other]

Clustering and analysis of user behaviour in blockchain: A case study of Planet IX

Authors: Dorottya Zelenyanszki, Zhe Hou, Kamanashis Biswas, Vallipuram Muthukkumarasamy

Abstract: Decentralised applications (dApps) that run on public blockchains have the benefit of trustworthiness and transparency as every activity that happens on the blockchain can be publicly traced through the transaction data. However, this introduces a potential privacy problem as this data can be tracked and analysed, which can reveal user-behaviour information. A user behaviour analysis pipeline was… ▽ More Decentralised applications (dApps) that run on public blockchains have the benefit of trustworthiness and transparency as every activity that happens on the blockchain can be publicly traced through the transaction data. However, this introduces a potential privacy problem as this data can be tracked and analysed, which can reveal user-behaviour information. A user behaviour analysis pipeline was proposed to present how this type of information can be extracted and analysed to identify separate behavioural clusters that can describe how users behave in the game. The pipeline starts with the collection of transaction data, involving smart contracts, that is collected from a blockchain-based game called Planet IX. Both the raw transaction information and the transaction events are considered in the data collection. From this data, separate game actions can be formed and those are leveraged to present how and when the users conducted their in-game activities in the form of user flows. An extended version of these user flows also presents how the Non-Fungible Tokens (NFTs) are being leveraged in the user actions. The latter is given as input for a Graph Neural Network (GNN) model to provide graph embeddings for these flows which then can be leveraged by clustering algorithms to cluster user behaviours into separate behavioural clusters. We benchmark and compare well-known clustering algorithms as a part of the proposed method. The user behaviour clusters were analysed and visualised in a graph format. It was found that behavioural information can be extracted regarding the users that belong to these clusters. Such information can be exploited by malicious users to their advantage. To demonstrate this, a privacy threat model was also presented based on the results that correspond to multiple potentially affected areas. △ Less

Submitted 15 April, 2025; originally announced April 2025.

Comments: 15 pages, 8 figures, submitted to Blockchain: Research and Applications

arXiv:2504.11349 [pdf, other]

Explicit and Implicit Representations in AI-based 3D Reconstruction for Radiology: A Systematic Review

Authors: Yuezhe Yang, Boyu Yang, Yaqian Wang, Yang He, Xingbo Dong, Zhe Jin

Abstract: The demand for high-quality medical imaging in clinical practice and assisted diagnosis has made 3D reconstruction in radiological imaging a key research focus. Artificial intelligence (AI) has emerged as a promising approach to enhancing reconstruction accuracy while reducing acquisition and processing time, thereby minimizing patient radiation exposure and discomfort and ultimately benefiting cl… ▽ More The demand for high-quality medical imaging in clinical practice and assisted diagnosis has made 3D reconstruction in radiological imaging a key research focus. Artificial intelligence (AI) has emerged as a promising approach to enhancing reconstruction accuracy while reducing acquisition and processing time, thereby minimizing patient radiation exposure and discomfort and ultimately benefiting clinical diagnosis. This review explores state-of-the-art AI-based 3D reconstruction algorithms in radiological imaging, categorizing them into explicit and implicit approaches based on their underlying principles. Explicit methods include point-based, volume-based, and Gaussian representations, while implicit methods encompass implicit prior embedding and neural radiance fields. Additionally, we examine commonly used evaluation metrics and benchmark datasets. Finally, we discuss the current state of development, key challenges, and future research directions in this evolving field. Our project available on: https://github.com/Bean-Young/AI4Radiology. △ Less

Submitted 17 May, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

Comments: 20 pages, 5 figures, submit to Medical Image Analysis

MSC Class: 68T45 ACM Class: I.4.5

arXiv:2504.11148 [pdf, other]

Super time-resolved tomography

Authors: Zhe Hu, Kalle Josefsson, Zisheng Yao, Francisco García-Moreno, Malgorzata Makowska, Yuhe Zhang, Pablo Villanueva-Perez

Abstract: Understanding 3D fundamental processes is crucial for academic and industrial applications. Nowadays, X-ray time-resolved tomography, or tomoscopy, is a leading technique for in-situ and operando 4D (3D+time) characterization. Despite its ability to achieve 1000 tomograms per second at large-scale X-ray facilities, its applicability is limited by the centrifugal forces exerted on samples and the c… ▽ More Understanding 3D fundamental processes is crucial for academic and industrial applications. Nowadays, X-ray time-resolved tomography, or tomoscopy, is a leading technique for in-situ and operando 4D (3D+time) characterization. Despite its ability to achieve 1000 tomograms per second at large-scale X-ray facilities, its applicability is limited by the centrifugal forces exerted on samples and the challenges of developing suitable environments for such high-speed studies. Here, we introduce STRT, an approach that has the potential to enhance the temporal resolution of tomoscopy by at least an order of magnitude while preserving spatial resolution. STRT exploits a 4D DL reconstruction algorithm to produce high-fidelity 3D reconstructions at each time point, retrieved from a significantly reduced angular range of a few degrees compared to the 0-180 degrees of traditional tomoscopy. Thus, STRT enhances the temporal resolution compared to tomoscopy by a factor equal to the ratio between 180 degrees and the angular ranges used by STRT. In this work, we validate the 4D capabilities of STRT through simulations and experiments on droplet collision simulations and additive manufacturing processes. We anticipate that STRT will significantly expand the capabilities of 4D X-ray imaging, enabling previously unattainable studies in both academic and industrial contexts, such as materials formation and mechanical testing. △ Less

Submitted 15 April, 2025; originally announced April 2025.

arXiv:2504.10772 [pdf]

Scanning-free three-dimensional fluorescent dipoles imaging by polarization self-interference digital holography (pSIDH)

Authors: Tianlong Man, Wenxue Zhang, Lu Zhang, Ran Zheng, Hua Huang, Xinhui Liu, Hongqiang Zhou, Zhe Wang, Yuhong Wan

Abstract: Polarization microscopy provides insights into the structure and orientational organization of biomolecules and their architectures in cells. The above key functional signatures, which are natively 3D, can be only detected in 2D for a single measurement in conventional polarization microscopy. It is so far a challenging task to capture simultaneously the 3D structure and molecular orientation in a… ▽ More Polarization microscopy provides insights into the structure and orientational organization of biomolecules and their architectures in cells. The above key functional signatures, which are natively 3D, can be only detected in 2D for a single measurement in conventional polarization microscopy. It is so far a challenging task to capture simultaneously the 3D structure and molecular orientation in a single frame of far-field intensity distribution, within the timescale of rapid-happened spatial organization events of bio-complexes. We report an optical imaging method called pSIDH, to encode multidimensional sample information includes 3D structures and dipole orientations, in their far-field fluorescence-self-interference pattern. The computational reconstruction from the holographic extracted complex-valued light field provides optical-aberration-corrected 3D polarization images of the sample. In pSIDH microscope incorporating planar liquid crystal lens and high numerical aperture objective, we demonstrate scanning-free 3D volumetric polarization imaging of fluorescently-labelled sample, with simultaneously computational-improved system measuring accuracy on the 3D spatial and polarization dimensions. The pSIDH imaging on phalloidin-fluorophore labelling U2OS cells provides rapid tools of capturing simultaneous the 3D structural details and spatial-averaged molecular orientation distributions of biological complex architectures such as actin filaments. △ Less

Submitted 14 April, 2025; originally announced April 2025.

arXiv:2504.10525 [pdf]

BioChemInsight: An Open-Source Toolkit for Automated Identification and Recognition of Optical Chemical Structures and Activity Data in Scientific Publications

Authors: Zhe Wang, Fangtian Fu, Wei Zhang, Lige Yan, Yan Meng, Jianping Wu, Hui Wu, Gang Xu, Si Chen

Abstract: Automated extraction of chemical structures and their bioactivity data is crucial for accelerating drug discovery and enabling data-driven pharmaceutical research. Existing optical chemical structure recognition (OCSR) tools fail to autonomously associate molecular structures with their bioactivity profiles, creating a critical bottleneck in structure-activity relationship (SAR) analysis. Here, we… ▽ More Automated extraction of chemical structures and their bioactivity data is crucial for accelerating drug discovery and enabling data-driven pharmaceutical research. Existing optical chemical structure recognition (OCSR) tools fail to autonomously associate molecular structures with their bioactivity profiles, creating a critical bottleneck in structure-activity relationship (SAR) analysis. Here, we present BioChemInsight, an open-source pipeline that integrates: (1) DECIMER Segmentation and MolVec for chemical structure recognition, (2) Qwen2.5-VL-32B for compound identifier association, and (3) PaddleOCR with Gemini-2.0-flash for bioactivity extraction and unit normalization. We evaluated the performance of BioChemInsight on 25 patents and 17 articles. BioChemInsight achieved 95% accuracy for tabular patent data (structure/identifier recognition), with lower accuracy in non-tabular patents (~80% structures, ~75% identifiers), plus 92.2 % bioactivity extraction accuracy. For articles, it attained >99% identifiers and 78-80% structure accuracy in non-tabular formats, plus 97.4% bioactivity extraction accuracy. The system generates ready-to-use SAR datasets, reducing data preprocessing time from weeks to hours while enabling applications in high-throughput screening and ML-driven drug design (https://github.com/dahuilangda/BioChemInsight). △ Less

Submitted 12 April, 2025; originally announced April 2025.

Comments: 20 pages, 7 figures

arXiv:2504.10479 [pdf, other]

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Authors: Jinguo Zhu, Weiyun Wang, Zhe Chen, Zhaoyang Liu, Shenglong Ye, Lixin Gu, Hao Tian, Yuchen Duan, Weijie Su, Jie Shao, Zhangwei Gao, Erfei Cui, Xuehui Wang, Yue Cao, Yangzhou Liu, Xingguang Wei, Hongjie Zhang, Haomin Wang, Weiye Xu, Hao Li, Jiahao Wang, Nianchen Deng, Songze Li, Yinan He, Tan Jiang , et al. (26 additional authors not shown)

Abstract: We introduce InternVL3, a significant advancement in the InternVL series featuring a native multimodal pre-training paradigm. Rather than adapting a text-only large language model (LLM) into a multimodal large language model (MLLM) that supports visual inputs, InternVL3 jointly acquires multimodal and linguistic capabilities from both diverse multimodal data and pure-text corpora during a single p… ▽ More We introduce InternVL3, a significant advancement in the InternVL series featuring a native multimodal pre-training paradigm. Rather than adapting a text-only large language model (LLM) into a multimodal large language model (MLLM) that supports visual inputs, InternVL3 jointly acquires multimodal and linguistic capabilities from both diverse multimodal data and pure-text corpora during a single pre-training stage. This unified training paradigm effectively addresses the complexities and alignment challenges commonly encountered in conventional post-hoc training pipelines for MLLMs. To further improve performance and scalability, InternVL3 incorporates variable visual position encoding (V2PE) to support extended multimodal contexts, employs advanced post-training techniques such as supervised fine-tuning (SFT) and mixed preference optimization (MPO), and adopts test-time scaling strategies alongside an optimized training infrastructure. Extensive empirical evaluations demonstrate that InternVL3 delivers superior performance across a wide range of multi-modal tasks. In particular, InternVL3-78B achieves a score of 72.2 on the MMMU benchmark, setting a new state-of-the-art among open-source MLLMs. Its capabilities remain highly competitive with leading proprietary models, including ChatGPT-4o, Claude 3.5 Sonnet, and Gemini 2.5 Pro, while also maintaining strong pure-language proficiency. In pursuit of open-science principles, we will publicly release both the training data and model weights to foster further research and development in next-generation MLLMs. △ Less

Submitted 18 April, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

Comments: Technical Report

arXiv:2504.10474 [pdf, other]

Co-optimizing Physical Reconfiguration Parameters and Controllers for an Origami-inspired Reconfigurable Manipulator

Authors: Zhe Chen, Li Chen, Hao Zhang, Jianguo Zhao

Abstract: Reconfigurable robots that can change their physical configuration post-fabrication have demonstrate their potential in adapting to different environments or tasks. However, it is challenging to determine how to optimally adjust reconfigurable parameters for a given task, especially when the controller depends on the robot's configuration. In this paper, we address this problem using a tendon-driv… ▽ More Reconfigurable robots that can change their physical configuration post-fabrication have demonstrate their potential in adapting to different environments or tasks. However, it is challenging to determine how to optimally adjust reconfigurable parameters for a given task, especially when the controller depends on the robot's configuration. In this paper, we address this problem using a tendon-driven reconfigurable manipulator composed of multiple serially connected origami-inspired modules as an example. Under tendon actuation, these modules can achieve different shapes and motions, governed by joint stiffnesses (reconfiguration parameters) and the tendon displacements (control inputs). We leverage recent advances in co-optimization of design and control for robotic system to treat reconfiguration parameters as design variables and optimize them using reinforcement learning techniques. We first establish a forward model based on the minimum potential energy method to predict the shape of the manipulator under tendon actuations. Using the forward model as the environment dynamics, we then co-optimize the control policy (on the tendon displacements) and joint stiffnesses of the modules for goal reaching tasks while ensuring collision avoidance. Through co-optimization, we obtain optimized joint stiffness and the corresponding optimal control policy to enable the manipulator to accomplish the task that would be infeasible with fixed reconfiguration parameters (i.e., fixed joint stiffness). We envision the co-optimization framework can be extended to other reconfigurable robotic systems, enabling them to optimally adapt their configuration and behavior for diverse tasks and environments. △ Less

Submitted 14 April, 2025; originally announced April 2025.

arXiv:2504.10160 [pdf, other]

MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning

Authors: Zhaopeng Feng, Shaosheng Cao, Jiahan Ren, Jiayuan Su, Ruizhe Chen, Yan Zhang, Zhe Xu, Yao Hu, Jian Wu, Zuozhu Liu

Abstract: Large-scale reinforcement learning (RL) methods have proven highly effective in enhancing the reasoning abilities of large language models (LLMs), particularly for tasks with verifiable solutions such as mathematics and coding. However, applying this idea to machine translation (MT), where outputs are flexibly formatted and difficult to automatically evaluate with explicit rules, remains underexpl… ▽ More Large-scale reinforcement learning (RL) methods have proven highly effective in enhancing the reasoning abilities of large language models (LLMs), particularly for tasks with verifiable solutions such as mathematics and coding. However, applying this idea to machine translation (MT), where outputs are flexibly formatted and difficult to automatically evaluate with explicit rules, remains underexplored. In this work, we introduce MT-R1-Zero, the first open-source adaptation of the R1-Zero RL framework for MT without supervised fine-tuning or cold-start. We propose a rule-metric mixed reward mechanism to guide LLMs towards improved translation quality via emergent reasoning. On the WMT 24 English-Chinese benchmark, our MT-R1-Zero-3B-Mix achieves competitive performance, surpassing TowerInstruct-7B-v0.2 by an average of 1.26 points. Meanwhile, our MT-R1-Zero-7B-Mix attains a high average score of 62.25 across all metrics, placing it on par with advanced proprietary models such as GPT-4o and Claude-3.5-Sonnet, while the MT-R1-Zero-7B-Sem variant achieves state-of-the-art scores on semantic metrics. Moreover, our work exhibits strong generalization capabilities on out-of-distribution MT tasks, robustly supporting multilingual and low-resource settings. Extensive analysis of model behavior across different initializations and reward metrics offers pioneering insight into the critical role of reward design, LLM adaptability, training dynamics, and emergent reasoning patterns within the R1-Zero paradigm for MT. Our code is available at https://github.com/fzp0424/MT-R1-Zero. △ Less

Submitted 14 April, 2025; originally announced April 2025.

Comments: Work in progress. Our code is available at https://github.com/fzp0424/MT-R1-Zero

arXiv:2504.09377 [pdf, other]

Beyond Degradation Conditions: All-in-One Image Restoration via HOG Transformers

Authors: Jiawei Wu, Zhifei Yang, Zhe Wang, Zhi Jin

Abstract: All-in-one image restoration, which aims to address diverse degradations within a unified framework, is critical for practical applications. However, existing methods rely on predicting and integrating degradation conditions, which can misactivate degradation-specific features in complex scenarios, limiting their restoration performance. To address this issue, we propose a novel all-in-one image r… ▽ More All-in-one image restoration, which aims to address diverse degradations within a unified framework, is critical for practical applications. However, existing methods rely on predicting and integrating degradation conditions, which can misactivate degradation-specific features in complex scenarios, limiting their restoration performance. To address this issue, we propose a novel all-in-one image restoration framework guided by Histograms of Oriented Gradients (HOG), named HOGformer. By leveraging the degradation-discriminative capability of HOG descriptors, HOGformer employs a dynamic self-attention mechanism that adaptively attends to long-range spatial dependencies based on degradation-aware HOG cues. To enhance the degradation sensitivity of attention inputs, we design a HOG-guided local dynamic-range convolution module that captures long-range degradation similarities while maintaining awareness of global structural information. Furthermore, we propose a dynamic interaction feed-forward module, efficiently increasing the model capacity to adapt to different degradations through channel-spatial interactions. Extensive experiments across diverse benchmarks, including adverse weather and natural degradations, demonstrate that HOGformer achieves state-of-the-art performance and generalizes effectively to complex real-world degradations. Code is available at https://github.com/Fire-friend/HOGformer. △ Less

Submitted 12 April, 2025; originally announced April 2025.

arXiv:2504.09223 [pdf, other]

DL-QAT: Weight-Decomposed Low-Rank Quantization-Aware Training for Large Language Models

Authors: Wenjin Ke, Zhe Li, Dong Li, Lu Tian, Emad Barsoum

Abstract: Improving the efficiency of inference in Large Language Models (LLMs) is a critical area of research. Post-training Quantization (PTQ) is a popular technique, but it often faces challenges at low-bit levels, particularly in downstream tasks. Quantization-aware Training (QAT) can alleviate this problem, but it requires significantly more computational resources. To tackle this, we introduced Weight… ▽ More Improving the efficiency of inference in Large Language Models (LLMs) is a critical area of research. Post-training Quantization (PTQ) is a popular technique, but it often faces challenges at low-bit levels, particularly in downstream tasks. Quantization-aware Training (QAT) can alleviate this problem, but it requires significantly more computational resources. To tackle this, we introduced Weight-Decomposed Low-Rank Quantization-Aware Training (DL-QAT), which merges the advantages of QAT while training only less than 1% of the total parameters. Specifically, we introduce a group-specific quantization magnitude to adjust the overall scale of each quantization group. Within each quantization group, we use LoRA matrices to update the weight size and direction in the quantization space. We validated the effectiveness of our method on the LLaMA and LLaMA2 model families. The results show significant improvements over our baseline method across different quantization granularities. For instance, for LLaMA-7B, our approach outperforms the previous state-of-the-art method by 4.2% in MMLU on 3-bit LLaMA-7B model. Additionally, our quantization results on pre-trained models also surpass previous QAT methods, demonstrating the superior performance and efficiency of our approach. △ Less

Submitted 12 April, 2025; originally announced April 2025.

Journal ref: https://aclanthology.org/2024.emnlp-industry.10/

arXiv:2504.09189 [pdf]

Low latency global carbon budget reveals a continuous decline of the land carbon sink during the 2023/24 El Nino event

Authors: Piyu Ke, Philippe Ciais, Yitong Yao, Stephen Sitch, Wei Li, Yidi Xu, Xiaomeng Du, Xiaofan Gui, Ana Bastos, Sonke Zaehle, Ben Poulter, Thomas Colligan, Auke M. van der Woude, Wouter Peters, Zhu Liu, Zhe Jin, Xiangjun Tian, Yilong Wang, Junjie Liu, Sudhanshu Pandey, Chris O'Dell, Jiang Bian, Chuanlong Zhou, John Miller, Xin Lan , et al. (6 additional authors not shown)

Abstract: The high growth rate of atmospheric CO2 in 2023 was found to be caused by a severe reduction of the global net land carbon sink. Here we update the global CO2 budget from January 1st to July 1st 2024, during which El Niño drought conditions continued to prevail in the Tropics but ceased by March 2024. We used three dynamic global vegetation models (DGVMs), machine learning emulators of ocean model… ▽ More The high growth rate of atmospheric CO2 in 2023 was found to be caused by a severe reduction of the global net land carbon sink. Here we update the global CO2 budget from January 1st to July 1st 2024, during which El Niño drought conditions continued to prevail in the Tropics but ceased by March 2024. We used three dynamic global vegetation models (DGVMs), machine learning emulators of ocean models, three atmospheric inversions driven by observations from the second Orbiting Carbon Observatory (OCO-2) satellite, and near-real-time fossil CO2 emissions estimates. In a one-year period from July 2023 to July 2024 covering the El Niño 2023/24 event, we found a record-high CO2 growth rate of 3.66~$\pm$~0.09 ppm~yr$^{-1}$ ($\pm$~1 standard deviation) since 1979. Yet, the CO2 growth rate anomaly obtained after removing the long term trend is 1.1 ppm~yr$^{-1}$, which is marginally smaller than the July--July growth rate anomalies of the two major previous El Niño events in 1997/98 and 2015/16. The atmospheric CO2 growth rate anomaly was primarily driven by a 2.24 GtC~yr$^{-1}$ reduction in the net land sink including 0.3 GtC~yr$^{-1}$ of fire emissions, partly offset by a 0.38 GtC~yr$^{-1}$ increase in the ocean sink relative to the 2015--2022 July--July mean. The tropics accounted for 97.5\% of the land CO2 flux anomaly, led by the Amazon (50.6\%), central Africa (34\%), and Southeast Asia (8.2\%), with extra-tropical sources in South Africa and southern Brazil during April--July 2024. Our three DGVMs suggest greater tropical CO2 losses in 2023/2024 than during the two previous large El Niño in 1997/98 and 2015/16, whereas inversions indicate losses more comparable to 2015/16. Overall, this update of the low latency budget highlights the impact of recent El Niño droughts in explaining the high CO2 growth rate until July 2024. △ Less

Submitted 12 April, 2025; originally announced April 2025.

Showing 101–150 of 3,897 results for author: Zhe