-
Artificial intelligence-enabled precision medicine for inflammatory skin diseases
Authors:
Alice Tang,
Maria Wei,
Anna Haemel,
Cindy La,
Marina Sirota,
Ernest Y. Lee
Abstract:
Recent advances in artificial intelligence (AI) and multimodal data collection are revolutionizing dermatology. Generative AI and machine learning approaches offer opportunities to enhance the diagnosis and treatment of inflammatory skin diseases, including atopic dermatitis, psoriasis, hidradenitis suppurativa, and autoimmune connective tissue disease. This review examines the current landscape o…
▽ More
Recent advances in artificial intelligence (AI) and multimodal data collection are revolutionizing dermatology. Generative AI and machine learning approaches offer opportunities to enhance the diagnosis and treatment of inflammatory skin diseases, including atopic dermatitis, psoriasis, hidradenitis suppurativa, and autoimmune connective tissue disease. This review examines the current landscape of AI applications for inflammatory skin diseases and explores how generative AI and machine learning methods can advance the field through deep phenotyping, disease heterogeneity characterization, drug development, personalized medicine, and clinical care. We discuss the promises and challenges of these technologies and present a vision for their integration into clinical practice.
△ Less
Submitted 14 May, 2025;
originally announced May 2025.
-
NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance
Authors:
Wenzhe Cai,
Jiaqi Peng,
Yuqiang Yang,
Yujian Zhang,
Meng Wei,
Hanqing Wang,
Yilun Chen,
Tai Wang,
Jiangmiao Pang
Abstract:
Learning navigation in dynamic open-world environments is an important yet challenging skill for robots. Most previous methods rely on precise localization and mapping or learn from expensive real-world demonstrations. In this paper, we propose the Navigation Diffusion Policy (NavDP), an end-to-end framework trained solely in simulation and can zero-shot transfer to different embodiments in divers…
▽ More
Learning navigation in dynamic open-world environments is an important yet challenging skill for robots. Most previous methods rely on precise localization and mapping or learn from expensive real-world demonstrations. In this paper, we propose the Navigation Diffusion Policy (NavDP), an end-to-end framework trained solely in simulation and can zero-shot transfer to different embodiments in diverse real-world environments. The key ingredient of NavDP's network is the combination of diffusion-based trajectory generation and a critic function for trajectory selection, which are conditioned on only local observation tokens encoded from a shared policy transformer. Given the privileged information of the global environment in simulation, we scale up the demonstrations of good quality to train the diffusion policy and formulate the critic value function targets with contrastive negative samples. Our demonstration generation approach achieves about 2,500 trajectories/GPU per day, 20$\times$ more efficient than real-world data collection, and results in a large-scale navigation dataset with 363.2km trajectories across 1244 scenes. Trained with this simulation dataset, NavDP achieves state-of-the-art performance and consistently outstanding generalization capability on quadruped, wheeled, and humanoid robots in diverse indoor and outdoor environments. In addition, we present a preliminary attempt at using Gaussian Splatting to make in-domain real-to-sim fine-tuning to further bridge the sim-to-real gap. Experiments show that adding such real-to-sim data can improve the success rate by 30\% without hurting its generalization capability.
△ Less
Submitted 13 May, 2025;
originally announced May 2025.
-
Hillclimb-Causal Inference: A Data-Driven Approach to Identify Causal Pathways Among Parental Behaviors, Genetic Risk, and Externalizing Behaviors in Children
Authors:
Mengman Wei,
Qian Peng
Abstract:
Motivation: Externalizing behaviors in children, such as aggression, hyperactivity, and defiance, are influenced by complex interplays between genetic predispositions and environmental factors, particularly parental behaviors. Unraveling these intricate causal relationships can benefit from the use of robust data-driven methods.
Methods: We developed a method called Hillclimb-Causal Inference, a…
▽ More
Motivation: Externalizing behaviors in children, such as aggression, hyperactivity, and defiance, are influenced by complex interplays between genetic predispositions and environmental factors, particularly parental behaviors. Unraveling these intricate causal relationships can benefit from the use of robust data-driven methods.
Methods: We developed a method called Hillclimb-Causal Inference, a causal discovery approach that integrates the Hill Climb Search algorithm with a customized Linear Gaussian Bayesian Information Criterion (BIC). This method was applied to data from the Adolescent Brain Cognitive Development (ABCD) Study, which included parental behavior assessments, children's genotypes, and externalizing behavior measures. We performed dimensionality reduction to address multicollinearity among parental behaviors and assessed children's genetic risk for externalizing disorders using polygenic risk scores (PRS), which were computed based on GWAS summary statistics from independent cohorts. Once the causal pathways were identified, we employed structural equation modeling (SEM) to quantify the relationships within the model.
Results: We identified prominent causal pathways linking parental behaviors to children's externalizing outcomes. Parental alcohol misuse and broader behavioral issues exhibited notably stronger direct effects (0.33 and 0.20, respectively) compared to children's polygenic risk scores (0.07). Moreover, when considering both direct and indirect paths, parental substance misuse (alcohol, drug, and tobacco) collectively resulted in a total effect exceeding 1.1 on externalizing behaviors. Bootstrap and sensitivity analyses further validated the robustness of these findings.
△ Less
Submitted 10 May, 2025;
originally announced May 2025.
-
Measurement of separate electron and positron spectra from 10 GeV to 20GeV with the geomagnetic field on DAMPE
Authors:
DAMPE Collaboration,
F. Alemanno,
Q. An,
P. Azzarello,
F. C. T. Barbato,
P. Bernardini,
X. J. Bi,
H. Boutin,
I. Cagnoli,
M. S. Cai,
E. Casilli,
E. Catanzani,
J. Chang,
D. Y. Chen,
J. L. Chen,
Z. F. Chen,
Z. X. Chen,
P. Coppin,
M. Y. Cui,
T. S. Cui,
Y. X. Cui,
I. DeMitri,
F. dePalma,
A. DiGiovanni,
T. K. Dong
, et al. (127 additional authors not shown)
Abstract:
The cosmic-ray (CR) electrons and positrons in space are of great significance for studying the origin and propagation of cosmic-rays. The satellite-borne experiment DArk Matter Particle Explorer (DAMPE) has been used to measure the separate electron and positron spectra, as well as the positron fraction. In this work, the Earth's magnetic field is used to distinguish CR electrons and positrons, a…
▽ More
The cosmic-ray (CR) electrons and positrons in space are of great significance for studying the origin and propagation of cosmic-rays. The satellite-borne experiment DArk Matter Particle Explorer (DAMPE) has been used to measure the separate electron and positron spectra, as well as the positron fraction. In this work, the Earth's magnetic field is used to distinguish CR electrons and positrons, as the DAMPE detector does not carry an onboard magnet. The energy range for the measurements is from 10 to 20 GeV, being currently limited at high energy by the zenith pointing orientation of DAMPE. The results are consistent with previous measurements based on the magnetic spectrometer by AMS-02 and PAMELA, while the results of Fermi-LAT seem then to be systematically shifted to larger values.
△ Less
Submitted 9 May, 2025;
originally announced May 2025.
-
WDMamba: When Wavelet Degradation Prior Meets Vision Mamba for Image Dehazing
Authors:
Jie Sun,
Heng Liu,
Yongzhen Wang,
Xiao-Ping Zhang,
Mingqiang Wei
Abstract:
In this paper, we reveal a novel haze-specific wavelet degradation prior observed through wavelet transform analysis, which shows that haze-related information predominantly resides in low-frequency components. Exploiting this insight, we propose a novel dehazing framework, WDMamba, which decomposes the image dehazing task into two sequential stages: low-frequency restoration followed by detail en…
▽ More
In this paper, we reveal a novel haze-specific wavelet degradation prior observed through wavelet transform analysis, which shows that haze-related information predominantly resides in low-frequency components. Exploiting this insight, we propose a novel dehazing framework, WDMamba, which decomposes the image dehazing task into two sequential stages: low-frequency restoration followed by detail enhancement. This coarse-to-fine strategy enables WDMamba to effectively capture features specific to each stage of the dehazing process, resulting in high-quality restored images. Specifically, in the low-frequency restoration stage, we integrate Mamba blocks to reconstruct global structures with linear complexity, efficiently removing overall haze and producing a coarse restored image. Thereafter, the detail enhancement stage reinstates fine-grained information that may have been overlooked during the previous phase, culminating in the final dehazed output. Furthermore, to enhance detail retention and achieve more natural dehazing, we introduce a self-guided contrastive regularization during network training. By utilizing the coarse restored output as a hard negative example, our model learns more discriminative representations, substantially boosting the overall dehazing performance. Extensive evaluations on public dehazing benchmarks demonstrate that our method surpasses state-of-the-art approaches both qualitatively and quantitatively. Code is available at https://github.com/SunJ000/WDMamba.
△ Less
Submitted 7 May, 2025;
originally announced May 2025.
-
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains
Authors:
Qianchu Liu,
Sheng Zhang,
Guanghui Qin,
Timothy Ossowski,
Yu Gu,
Ying Jin,
Sid Kiblawi,
Sam Preston,
Mu Wei,
Paul Vozila,
Tristan Naumann,
Hoifung Poon
Abstract:
Recent proprietary models (e.g., o3) have begun to demonstrate strong multimodal reasoning capabilities. Yet, most existing open-source research concentrates on training text-only reasoning models, with evaluations limited to mainly mathematical and general-domain tasks. Therefore, it remains unclear how to effectively extend reasoning capabilities beyond text input and general domains. This paper…
▽ More
Recent proprietary models (e.g., o3) have begun to demonstrate strong multimodal reasoning capabilities. Yet, most existing open-source research concentrates on training text-only reasoning models, with evaluations limited to mainly mathematical and general-domain tasks. Therefore, it remains unclear how to effectively extend reasoning capabilities beyond text input and general domains. This paper explores a fundamental research question: Is reasoning generalizable across modalities and domains? Our findings support an affirmative answer: General-domain text-based post-training can enable such strong generalizable reasoning. Leveraging this finding, we introduce X-Reasoner, a vision-language model post-trained solely on general-domain text for generalizable reasoning, using a two-stage approach: an initial supervised fine-tuning phase with distilled long chain-of-thoughts, followed by reinforcement learning with verifiable rewards. Experiments show that X-Reasoner successfully transfers reasoning capabilities to both multimodal and out-of-domain settings, outperforming existing state-of-the-art models trained with in-domain and multimodal data across various general and medical benchmarks (Figure 1). Additionally, we find that X-Reasoner's performance in specialized domains can be further enhanced through continued training on domain-specific text-only data. Building upon this, we introduce X-Reasoner-Med, a medical-specialized variant that achieves new state of the art on numerous text-only and multimodal medical benchmarks.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
Record Magnetic Field Generation by Laser-Driven Capacitor-Coil Targets
Authors:
Lan Gao,
Yang Zhang,
Hantao Ji,
Brandon K. Russell,
Geoffrey Pomraning,
Jesse Griff-McMahon,
Sallee Klein,
Carolyn Kuranz,
Mingsheng Wei
Abstract:
Magnetic fields generated by capacitor-coil targets driven by intense short-pulse lasers have been characterized using ultrafast proton radiography. A 1-kJ, 15-ps laser at a center wavelength of 1053 nm irradiated the back plate of the capacitor with an intensity of $\sim$8.3 $\times$ 10$^{18}$ W$/$cm$^{2}$, creating ultra large currents in the connecting coils. High-quality proton data obtained i…
▽ More
Magnetic fields generated by capacitor-coil targets driven by intense short-pulse lasers have been characterized using ultrafast proton radiography. A 1-kJ, 15-ps laser at a center wavelength of 1053 nm irradiated the back plate of the capacitor with an intensity of $\sim$8.3 $\times$ 10$^{18}$ W$/$cm$^{2}$, creating ultra large currents in the connecting coils. High-quality proton data obtained in the axial probing geometry show definitive signatures of magnetic field generation allowing precision measurement of the field distribution and strength. The data show a peak coil current of 150 $\pm$ 20 kA producing 250 $\pm$ 30 Tesla magnetic fields at the coil center. This sets a new record for magnetic field generation by the short-pulse-powered capacitor-coil targets.
△ Less
Submitted 4 May, 2025;
originally announced May 2025.
-
Determining Magnetic and Electric Field Generations in Laser-Driven Coil Targets
Authors:
Yang Zhang,
Lan Gao,
Hantao Ji,
Brandon K. Russell,
Geoffrey Pomraning,
Jesse Griff-McMahon,
Sallee Klein,
Carolyn Kuranz,
Mingsheng Wei
Abstract:
Laser-driven capacitor coils are widely used to generate intense magnetic fields for various applications in high-energy-density physics research. Accurate measurement of the magnetic fields is essential but challenging, due to the overlapping contributions from magnetic and electric fields in proton radiography, which is the primary tool diagnosing the field generation around the coils. In this s…
▽ More
Laser-driven capacitor coils are widely used to generate intense magnetic fields for various applications in high-energy-density physics research. Accurate measurement of the magnetic fields is essential but challenging, due to the overlapping contributions from magnetic and electric fields in proton radiography, which is the primary tool diagnosing the field generation around the coils. In this study, we systematically analyze proton radiographs obtained from laser-driven capacitor-coil targets along two orthogonal axes under various electromagnetic field conditions, including magnetic field only, electric field only, and combined electromagnetic fields. By analyzing key features in the radiographs, we distinguish and characterize the respective contributions from magnetic and electric fields. Using detailed simulations validated by experimental benchmarks, methods to isolate and quantify the magnetic field and electric field are given. The methods are successfully applied to determine the electric current and charge distribution in a double coil configuration. Our findings provide insights into improving the diagnostic capability of proton radiography, potentially leading to more accurate measurements of electromagnetic fields and enhancing the utility of laser-driven capacitor coils in high-energy-density experiments.
△ Less
Submitted 4 May, 2025;
originally announced May 2025.
-
MagicPortrait: Temporally Consistent Face Reenactment with 3D Geometric Guidance
Authors:
Mengting Wei,
Yante Li,
Tuomas Varanka,
Yan Jiang,
Guoying Zhao
Abstract:
In this study, we propose a method for video face reenactment that integrates a 3D face parametric model into a latent diffusion framework, aiming to improve shape consistency and motion control in existing video-based face generation approaches. Our approach employs the FLAME (Faces Learned with an Articulated Model and Expressions) model as the 3D face parametric representation, providing a unif…
▽ More
In this study, we propose a method for video face reenactment that integrates a 3D face parametric model into a latent diffusion framework, aiming to improve shape consistency and motion control in existing video-based face generation approaches. Our approach employs the FLAME (Faces Learned with an Articulated Model and Expressions) model as the 3D face parametric representation, providing a unified framework for modeling face expressions and head pose. This not only enables precise extraction of motion features from driving videos, but also contributes to the faithful preservation of face shape and geometry. Specifically, we enhance the latent diffusion model with rich 3D expression and detailed pose information by incorporating depth maps, normal maps, and rendering maps derived from FLAME sequences. These maps serve as motion guidance and are encoded into the denoising UNet through a specifically designed Geometric Guidance Encoder (GGE). A multi-layer feature fusion module with integrated self-attention mechanisms is used to combine facial appearance and motion latent features within the spatial domain. By utilizing the 3D face parametric model as motion guidance, our method enables parametric alignment of face identity between the reference image and the motion captured from the driving video. Experimental results on benchmark datasets show that our method excels at generating high-quality face animations with precise expression and head pose variation modeling. In addition, it demonstrates strong generalization performance on out-of-domain images. Code is publicly available at https://github.com/weimengting/MagicPortrait.
△ Less
Submitted 10 May, 2025; v1 submitted 30 April, 2025;
originally announced April 2025.
-
3DV-TON: Textured 3D-Guided Consistent Video Try-on via Diffusion Models
Authors:
Min Wei,
Chaohui Yu,
Jingkai Zhou,
Fan Wang
Abstract:
Video try-on replaces clothing in videos with target garments. Existing methods struggle to generate high-quality and temporally consistent results when handling complex clothing patterns and diverse body poses. We present 3DV-TON, a novel diffusion-based framework for generating high-fidelity and temporally consistent video try-on results. Our approach employs generated animatable textured 3D mes…
▽ More
Video try-on replaces clothing in videos with target garments. Existing methods struggle to generate high-quality and temporally consistent results when handling complex clothing patterns and diverse body poses. We present 3DV-TON, a novel diffusion-based framework for generating high-fidelity and temporally consistent video try-on results. Our approach employs generated animatable textured 3D meshes as explicit frame-level guidance, alleviating the issue of models over-focusing on appearance fidelity at the expanse of motion coherence. This is achieved by enabling direct reference to consistent garment texture movements throughout video sequences. The proposed method features an adaptive pipeline for generating dynamic 3D guidance: (1) selecting a keyframe for initial 2D image try-on, followed by (2) reconstructing and animating a textured 3D mesh synchronized with original video poses. We further introduce a robust rectangular masking strategy that successfully mitigates artifact propagation caused by leaking clothing information during dynamic human and garment movements. To advance video try-on research, we introduce HR-VVT, a high-resolution benchmark dataset containing 130 videos with diverse clothing types and scenarios. Quantitative and qualitative results demonstrate our superior performance over existing methods. The project page is at this link https://2y7c3.github.io/3DV-TON/
△ Less
Submitted 24 April, 2025;
originally announced April 2025.
-
A Deep Learning Framework for Sequence Mining with Bidirectional LSTM and Multi-Scale Attention
Authors:
Tao Yang,
Yu Cheng,
Yaokun Ren,
Yujia Lou,
Minggu Wei,
Honghui Xin
Abstract:
This paper addresses the challenges of mining latent patterns and modeling contextual dependencies in complex sequence data. A sequence pattern mining algorithm is proposed by integrating Bidirectional Long Short-Term Memory (BiLSTM) with a multi-scale attention mechanism. The BiLSTM captures both forward and backward dependencies in sequences, enhancing the model's ability to perceive global cont…
▽ More
This paper addresses the challenges of mining latent patterns and modeling contextual dependencies in complex sequence data. A sequence pattern mining algorithm is proposed by integrating Bidirectional Long Short-Term Memory (BiLSTM) with a multi-scale attention mechanism. The BiLSTM captures both forward and backward dependencies in sequences, enhancing the model's ability to perceive global contextual structures. At the same time, the multi-scale attention module assigns adaptive weights to key feature regions under different window sizes. This improves the model's responsiveness to both local and global important information. Extensive experiments are conducted on a publicly available multivariate time series dataset. The proposed model is compared with several mainstream sequence modeling methods. Results show that it outperforms existing models in terms of accuracy, precision, and recall. This confirms the effectiveness and robustness of the proposed architecture in complex pattern recognition tasks. Further ablation studies and sensitivity analyses are carried out to investigate the effects of attention scale and input sequence length on model performance. These results provide empirical support for structural optimization of the model.
△ Less
Submitted 21 April, 2025;
originally announced April 2025.
-
RealisDance-DiT: Simple yet Strong Baseline towards Controllable Character Animation in the Wild
Authors:
Jingkai Zhou,
Yifan Wu,
Shikai Li,
Min Wei,
Chao Fan,
Weihua Chen,
Wei Jiang,
Fan Wang
Abstract:
Controllable character animation remains a challenging problem, particularly in handling rare poses, stylized characters, character-object interactions, complex illumination, and dynamic scenes. To tackle these issues, prior work has largely focused on injecting pose and appearance guidance via elaborate bypass networks, but often struggles to generalize to open-world scenarios. In this paper, we…
▽ More
Controllable character animation remains a challenging problem, particularly in handling rare poses, stylized characters, character-object interactions, complex illumination, and dynamic scenes. To tackle these issues, prior work has largely focused on injecting pose and appearance guidance via elaborate bypass networks, but often struggles to generalize to open-world scenarios. In this paper, we propose a new perspective that, as long as the foundation model is powerful enough, straightforward model modifications with flexible fine-tuning strategies can largely address the above challenges, taking a step towards controllable character animation in the wild. Specifically, we introduce RealisDance-DiT, built upon the Wan-2.1 video foundation model. Our sufficient analysis reveals that the widely adopted Reference Net design is suboptimal for large-scale DiT models. Instead, we demonstrate that minimal modifications to the foundation model architecture yield a surprisingly strong baseline. We further propose the low-noise warmup and "large batches and small iterations" strategies to accelerate model convergence during fine-tuning while maximally preserving the priors of the foundation model. In addition, we introduce a new test dataset that captures diverse real-world challenges, complementing existing benchmarks such as TikTok dataset and UBC fashion video dataset, to comprehensively evaluate the proposed method. Extensive experiments show that RealisDance-DiT outperforms existing methods by a large margin.
△ Less
Submitted 21 April, 2025;
originally announced April 2025.
-
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Authors:
Team Seawead,
Ceyuan Yang,
Zhijie Lin,
Yang Zhao,
Shanchuan Lin,
Zhibei Ma,
Haoyuan Guo,
Hao Chen,
Lu Qi,
Sen Wang,
Feng Cheng,
Feilong Zuo,
Xuejiao Zeng,
Ziyan Yang,
Fangyuan Kong,
Meng Wei,
Zhiwu Qing,
Fei Xiao,
Tuyen Hoang,
Siyu Zhang,
Peihao Zhu,
Qi Zhao,
Jiangqiao Yan,
Liangke Gui,
Sheng Bi
, et al. (30 additional authors not shown)
Abstract:
This technical report presents a cost-efficient strategy for training a video generation foundation model. We present a mid-sized research model with approximately 7 billion parameters (7B) called Seaweed-7B trained from scratch using 665,000 H100 GPU hours. Despite being trained with moderate computational resources, Seaweed-7B demonstrates highly competitive performance compared to contemporary…
▽ More
This technical report presents a cost-efficient strategy for training a video generation foundation model. We present a mid-sized research model with approximately 7 billion parameters (7B) called Seaweed-7B trained from scratch using 665,000 H100 GPU hours. Despite being trained with moderate computational resources, Seaweed-7B demonstrates highly competitive performance compared to contemporary video generation models of much larger size. Design choices are especially crucial in a resource-constrained setting. This technical report highlights the key design decisions that enhance the performance of the medium-sized diffusion model. Empirically, we make two observations: (1) Seaweed-7B achieves performance comparable to, or even surpasses, larger models trained on substantially greater GPU resources, and (2) our model, which exhibits strong generalization ability, can be effectively adapted across a wide range of downstream applications either by lightweight fine-tuning or continue training. See the project page at https://seaweed.video/
△ Less
Submitted 4 May, 2025; v1 submitted 11 April, 2025;
originally announced April 2025.
-
DA2Diff: Exploring Degradation-aware Adaptive Diffusion Priors for All-in-One Weather Restoration
Authors:
Jiamei Xiong,
Xuefeng Yan,
Yongzhen Wang,
Wei Zhao,
Xiao-Ping Zhang,
Mingqiang Wei
Abstract:
Image restoration under adverse weather conditions is a critical task for many vision-based applications. Recent all-in-one frameworks that handle multiple weather degradations within a unified model have shown potential. However, the diversity of degradation patterns across different weather conditions, as well as the complex and varied nature of real-world degradations, pose significant challeng…
▽ More
Image restoration under adverse weather conditions is a critical task for many vision-based applications. Recent all-in-one frameworks that handle multiple weather degradations within a unified model have shown potential. However, the diversity of degradation patterns across different weather conditions, as well as the complex and varied nature of real-world degradations, pose significant challenges for multiple weather removal. To address these challenges, we propose an innovative diffusion paradigm with degradation-aware adaptive priors for all-in-one weather restoration, termed DA2Diff. It is a new exploration that applies CLIP to perceive degradation-aware properties for better multi-weather restoration. Specifically, we deploy a set of learnable prompts to capture degradation-aware representations by the prompt-image similarity constraints in the CLIP space. By aligning the snowy/hazy/rainy images with snow/haze/rain prompts, each prompt contributes to different weather degradation characteristics. The learned prompts are then integrated into the diffusion model via the designed weather specific prompt guidance module, making it possible to restore multiple weather types. To further improve the adaptiveness to complex weather degradations, we propose a dynamic expert selection modulator that employs a dynamic weather-aware router to flexibly dispatch varying numbers of restoration experts for each weather-distorted image, allowing the diffusion model to restore diverse degradations adaptively. Experimental results substantiate the favorable performance of DA2Diff over state-of-the-arts in quantitative and qualitative evaluation. Source code will be available after acceptance.
△ Less
Submitted 7 April, 2025;
originally announced April 2025.
-
Experimental Evidence of Vortex $γ$ Photons in All-Optical Inverse Compton Scattering
Authors:
Mingxuan Wei,
Siyu Chen,
Yu Wang,
Xichen Hu,
Mingyang Zhu,
Hao Hu,
Pei-Lun He,
Weijun Zhou,
Jiao Jia,
Li Lu,
Boyuan Li,
Feng Liu,
Min Chen,
Liming Chen,
Jian-Xing Li,
Wenchao Yan,
Jie Zhang
Abstract:
Vortex $γ$ photons carrying orbital angular momenta (OAM) hold great potential for various applications. However, their generation remains a great challenge. Here, we successfully generate sub-MeV vortex $γ$ photons via all-optical inverse Compton scattering of relativistic electrons colliding with a sub-relativistic Laguerre-Gaussian laser. In principle, directly measuring the OAM of $γ$ photons…
▽ More
Vortex $γ$ photons carrying orbital angular momenta (OAM) hold great potential for various applications. However, their generation remains a great challenge. Here, we successfully generate sub-MeV vortex $γ$ photons via all-optical inverse Compton scattering of relativistic electrons colliding with a sub-relativistic Laguerre-Gaussian laser. In principle, directly measuring the OAM of $γ$ photons is challenging due to their incoherence and extremely short wavelength. Therein, we put forward a novel method to determine the OAM properties by revealing the quantum opening angle of vortex $γ$ photons, since vortex particles exhibit not only a spiral phase but also transverse momentum according to the quantum electrodynamics theory. Thus,$γ$ photons carrying OAM anifest a much larger angular distribution than those without OAM, which has been clearly observed in our experiments. This angular expansion is considered as an overall effect lying beyond classical theory. Our method provides the first experimental evidence for detecting vortex $γ$ photons and opens a new perspective for investigating OAM-induced quantum phenomena in broad fields.
△ Less
Submitted 24 March, 2025;
originally announced March 2025.
-
CausalCLIPSeg: Unlocking CLIP's Potential in Referring Medical Image Segmentation with Causal Intervention
Authors:
Yaxiong Chen,
Minghong Wei,
Zixuan Zheng,
Jingliang Hu,
Yilei Shi,
Shengwu Xiong,
Xiao Xiang Zhu,
Lichao Mou
Abstract:
Referring medical image segmentation targets delineating lesions indicated by textual descriptions. Aligning visual and textual cues is challenging due to their distinct data properties. Inspired by large-scale pre-trained vision-language models, we propose CausalCLIPSeg, an end-to-end framework for referring medical image segmentation that leverages CLIP. Despite not being trained on medical data…
▽ More
Referring medical image segmentation targets delineating lesions indicated by textual descriptions. Aligning visual and textual cues is challenging due to their distinct data properties. Inspired by large-scale pre-trained vision-language models, we propose CausalCLIPSeg, an end-to-end framework for referring medical image segmentation that leverages CLIP. Despite not being trained on medical data, we enforce CLIP's rich semantic space onto the medical domain by a tailored cross-modal decoding method to achieve text-to-pixel alignment. Furthermore, to mitigate confounding bias that may cause the model to learn spurious correlations instead of meaningful causal relationships, CausalCLIPSeg introduces a causal intervention module which self-annotates confounders and excavates causal features from inputs for segmentation judgments. We also devise an adversarial min-max game to optimize causal features while penalizing confounding ones. Extensive experiments demonstrate the state-of-the-art performance of our proposed method. Code is available at https://github.com/WUTCM-Lab/CausalCLIPSeg.
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
PointSFDA: Source-free Domain Adaptation for Point Cloud Completion
Authors:
Xing He,
Zhe Zhu,
Liangliang Nan,
Honghua Chen,
Jing Qin,
Mingqiang Wei
Abstract:
Conventional methods for point cloud completion, typically trained on synthetic datasets, face significant challenges when applied to out-of-distribution real-world scans. In this paper, we propose an effective yet simple source-free domain adaptation framework for point cloud completion, termed \textbf{PointSFDA}. Unlike unsupervised domain adaptation that reduces the domain gap by directly lever…
▽ More
Conventional methods for point cloud completion, typically trained on synthetic datasets, face significant challenges when applied to out-of-distribution real-world scans. In this paper, we propose an effective yet simple source-free domain adaptation framework for point cloud completion, termed \textbf{PointSFDA}. Unlike unsupervised domain adaptation that reduces the domain gap by directly leveraging labeled source data, PointSFDA uses only a pretrained source model and unlabeled target data for adaptation, avoiding the need for inaccessible source data in practical scenarios. Being the first source-free domain adaptation architecture for point cloud completion, our method offers two core contributions. First, we introduce a coarse-to-fine distillation solution to explicitly transfer the global geometry knowledge learned from the source dataset. Second, as noise may be introduced due to domain gaps, we propose a self-supervised partial-mask consistency training strategy to learn local geometry information in the target domain. Extensive experiments have validated that our method significantly improves the performance of state-of-the-art networks in cross-domain shape completion. Our code is available at \emph{\textcolor{magenta}{https://github.com/Starak-x/PointSFDA}}.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Remote preparation of motional Schrödinger cat states via dissipatively-driven non-Gaussian mechanical entanglement
Authors:
Zunbo Yu,
Miaomiao Wei,
Huatang Tan
Abstract:
In this paper, we propose a driven-dissipative scheme for generating non-Gaussian mechanical entangled states and remotely preparing mechanical Schrödinger cat states via the entanglement. The system under study consists of a cavity optomechanical setup with two frequency-mismatched mechanical oscillators coupled to a cavity field driven by a bichromatic pump. We show that under proper conditions,…
▽ More
In this paper, we propose a driven-dissipative scheme for generating non-Gaussian mechanical entangled states and remotely preparing mechanical Schrödinger cat states via the entanglement. The system under study consists of a cavity optomechanical setup with two frequency-mismatched mechanical oscillators coupled to a cavity field driven by a bichromatic pump. We show that under proper conditions, an effective Hamiltonian for nondegenerate parametric downconversion involving the two mechanical oscillators and the cavity field can be engineered. We demonstrate analytically and numerically that the cavity dissipation drives the mechanical oscillators into a steady-state pair-coherent state. The no-Gaussianity and nonclassical properties, including Winger negativity, entanglement and quantum steering, of the achieved non-Gaussian mechanical state are investigated in detail. We further show that homodyne detection on one mechanical oscillator enables the remote generation of Schrödinger cat states in the other oscillator through the non-Gaussian mechanical entanglement. As we show, this detection can be implemented by transferring the mechanical state to the output field of an auxiliary probe cavity coupled to the target oscillator, followed by homodyne detection on the output field. We also discuss the robustness of the mechanical entangled states and cat states against thermal fluctuations. Our findings establish a feasible approach for the dissipative and remote preparation of mechanical nonclassical states.
△ Less
Submitted 14 March, 2025;
originally announced March 2025.
-
ACMo: Attribute Controllable Motion Generation
Authors:
Mingjie Wei,
Xuemei Xie,
Guangming Shi
Abstract:
Attributes such as style, fine-grained text, and trajectory are specific conditions for describing motion. However, existing methods often lack precise user control over motion attributes and suffer from limited generalizability to unseen motions. This work introduces an Attribute Controllable Motion generation architecture, to address these challenges via decouple any conditions and control them…
▽ More
Attributes such as style, fine-grained text, and trajectory are specific conditions for describing motion. However, existing methods often lack precise user control over motion attributes and suffer from limited generalizability to unseen motions. This work introduces an Attribute Controllable Motion generation architecture, to address these challenges via decouple any conditions and control them separately. Firstly, we explored the Attribute Diffusion Model to imporve text-to-motion performance via decouple text and motion learning, as the controllable model relies heavily on the pre-trained model. Then, we introduce Motion Adpater to quickly finetune previously unseen motion patterns. Its motion prompts inputs achieve multimodal text-to-motion generation that captures user-specified styles. Finally, we propose a LLM Planner to bridge the gap between unseen attributes and dataset-specific texts via local knowledage for user-friendly interaction. Our approach introduces the capability for motion prompts for stylize generation, enabling fine-grained and user-friendly attribute control while providing performance comparable to state-of-the-art methods. Project page: https://mjwei3d.github.io/ACMo/
△ Less
Submitted 13 March, 2025;
originally announced March 2025.
-
CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models
Authors:
Hao He,
Ceyuan Yang,
Shanchuan Lin,
Yinghao Xu,
Meng Wei,
Liangke Gui,
Qi Zhao,
Gordon Wetzstein,
Lu Jiang,
Hongsheng Li
Abstract:
This paper introduces CameraCtrl II, a framework that enables large-scale dynamic scene exploration through a camera-controlled video diffusion model. Previous camera-conditioned video generative models suffer from diminished video dynamics and limited range of viewpoints when generating videos with large camera movement. We take an approach that progressively expands the generation of dynamic sce…
▽ More
This paper introduces CameraCtrl II, a framework that enables large-scale dynamic scene exploration through a camera-controlled video diffusion model. Previous camera-conditioned video generative models suffer from diminished video dynamics and limited range of viewpoints when generating videos with large camera movement. We take an approach that progressively expands the generation of dynamic scenes -- first enhancing dynamic content within individual video clip, then extending this capability to create seamless explorations across broad viewpoint ranges. Specifically, we construct a dataset featuring a large degree of dynamics with camera parameter annotations for training while designing a lightweight camera injection module and training scheme to preserve dynamics of the pretrained models. Building on these improved single-clip techniques, we enable extended scene exploration by allowing users to iteratively specify camera trajectories for generating coherent video sequences. Experiments across diverse scenarios demonstrate that CameraCtrl Ii enables camera-controlled dynamic scene synthesis with substantially wider spatial exploration than previous approaches.
△ Less
Submitted 13 March, 2025;
originally announced March 2025.
-
Steady-state tripartite non-Gaussian entanglement and steering in output field from intracavity triple-photon parametric downconversion
Authors:
Miaomiao Wei,
Huatang Tan
Abstract:
Nondegenerate triple-photon parametric downconversion (NTPD) is a potential source for unconditional tripartite non-Gaussian entangled states of continuous variables. Recent experiment has demonstrated strong third-order correlations among bright photon triplets via microwave NTPD in a superconducting cavity [Phys. Rev. X 10, 011011 (2020)]. Previous theoretic works have revealed that only short-t…
▽ More
Nondegenerate triple-photon parametric downconversion (NTPD) is a potential source for unconditional tripartite non-Gaussian entangled states of continuous variables. Recent experiment has demonstrated strong third-order correlations among bright photon triplets via microwave NTPD in a superconducting cavity [Phys. Rev. X 10, 011011 (2020)]. Previous theoretic works have revealed that only short-time genuine tripartite non-Gaussian entanglement can be generated in NTPD even in the absence of dissipation. In this paper, we investigate the properties of tripartite non-Gaussian entanglement and steering in the cavity output field by taking into account of the cavity dissipation. We first derive experimentally detectable criteria for fully inseparable and genuine tripartite non-Gaussian entanglement and steering. With the criteria, we then find that steady-state tripartite non-Gaussian entanglement and steering can be generated in the output field, although they merely exist in the short-time regime inside the cavity. We also find that the initial cavity-field coherent states can obviously enhance the steady-state and transient tripartite entanglement and steering, in comparison to the case of initial vacuum states. We finally show that the output tripartite non-Gaussian steerable correlations can be applied to the remote generation of negative Wigner-function quantum states by homodyne detection.
△ Less
Submitted 10 March, 2025;
originally announced March 2025.
-
Optomechanical non-Gaussian quantum steering and remote preparation of large-size motional Schördinger cat states
Authors:
Miaomiao Wei,
Huatang Tan
Abstract:
In this paper, we present a scheme for remotely generating large-size motional Schrödinger cat states in cavity optomechanical (OM) systems with non-Gaussian quantum steering of continuous variables. We consider that the output field from the OM cavity undergoes three typical kinds of multiphoton operations: multiphoton subtraction, multiphoton addition, or multiphoton catalysis, followed by homod…
▽ More
In this paper, we present a scheme for remotely generating large-size motional Schrödinger cat states in cavity optomechanical (OM) systems with non-Gaussian quantum steering of continuous variables. We consider that the output field from the OM cavity undergoes three typical kinds of multiphoton operations: multiphoton subtraction, multiphoton addition, or multiphoton catalysis, followed by homodyne detection. We first demonstrate that these multiphoton operations can lead to non-Gaussian OM quantum steerable correlations, which are unveiled by the subsequent homodyne detection with a Fisher-information-based steering criterion. It is found that the non-Gaussian steering is obviously enhanced with an increasing number $n$ of photons in the multiphoton operations, which, as we show, fails to be revealed with the well-known Reid's steering criterion. It therefore suggests that the Fisher-information-based criterion is more effective for witnessing non-Gaussian quantum steering. We next show that the strong OM steering enables the remote preparation of large-size Schrödinger odd or even cat states of the mechanical oscillator by the homodyne detection. Accordingly, the amplitudes of the cat states also increase significantly with the photon number $n$, particularly in the cases of multiphoton subtraction and addition. Our results reveal the properties of non-Gaussian steering generated by multiphoton operations, and the large cat states of macroscopic mechanical resonators hold promise for fundamental tests in quantum mechanics and practical applications in quantum science.
△ Less
Submitted 10 March, 2025;
originally announced March 2025.
-
Boltzmann Attention Sampling for Image Analysis with Small Objects
Authors:
Theodore Zhao,
Sid Kiblawi,
Naoto Usuyama,
Ho Hin Lee,
Sam Preston,
Hoifung Poon,
Mu Wei
Abstract:
Detecting and segmenting small objects, such as lung nodules and tumor lesions, remains a critical challenge in image analysis. These objects often occupy less than 0.1% of an image, making traditional transformer architectures inefficient and prone to performance degradation due to redundant attention computations on irrelevant regions. Existing sparse attention mechanisms rely on rigid hierarchi…
▽ More
Detecting and segmenting small objects, such as lung nodules and tumor lesions, remains a critical challenge in image analysis. These objects often occupy less than 0.1% of an image, making traditional transformer architectures inefficient and prone to performance degradation due to redundant attention computations on irrelevant regions. Existing sparse attention mechanisms rely on rigid hierarchical structures, which are poorly suited for detecting small, variable, and uncertain object locations. In this paper, we propose BoltzFormer, a novel transformer-based architecture designed to address these challenges through dynamic sparse attention. BoltzFormer identifies and focuses attention on relevant areas by modeling uncertainty using a Boltzmann distribution with an annealing schedule. Initially, a higher temperature allows broader area sampling in early layers, when object location uncertainty is greatest. As the temperature decreases in later layers, attention becomes more focused, enhancing efficiency and accuracy. BoltzFormer seamlessly integrates into existing transformer architectures via a modular Boltzmann attention sampling mechanism. Comprehensive evaluations on benchmark datasets demonstrate that BoltzFormer significantly improves segmentation performance for small objects while reducing attention computation by an order of magnitude compared to previous state-of-the-art methods.
△ Less
Submitted 26 March, 2025; v1 submitted 4 March, 2025;
originally announced March 2025.
-
STAR-Edge: Structure-aware Local Spherical Curve Representation for Thin-walled Edge Extraction from Unstructured Point Clouds
Authors:
Zikuan Li,
Honghua Chen,
Yuecheng Wang,
Sibo Wu,
Mingqiang Wei,
Jun Wang
Abstract:
Extracting geometric edges from unstructured point clouds remains a significant challenge, particularly in thin-walled structures that are commonly found in everyday objects. Traditional geometric methods and recent learning-based approaches frequently struggle with these structures, as both rely heavily on sufficient contextual information from local point neighborhoods. However, 3D measurement d…
▽ More
Extracting geometric edges from unstructured point clouds remains a significant challenge, particularly in thin-walled structures that are commonly found in everyday objects. Traditional geometric methods and recent learning-based approaches frequently struggle with these structures, as both rely heavily on sufficient contextual information from local point neighborhoods. However, 3D measurement data of thin-walled structures often lack the accurate, dense, and regular neighborhood sampling required for reliable edge extraction, resulting in degraded performance.
In this work, we introduce STAR-Edge, a novel approach designed for detecting and refining edge points in thin-walled structures. Our method leverages a unique representation-the local spherical curve-to create structure-aware neighborhoods that emphasize co-planar points while reducing interference from close-by, non-co-planar surfaces. This representation is transformed into a rotation-invariant descriptor, which, combined with a lightweight multi-layer perceptron, enables robust edge point classification even in the presence of noise and sparse or irregular sampling. Besides, we also use the local spherical curve representation to estimate more precise normals and introduce an optimization function to project initially identified edge points exactly on the true edges. Experiments conducted on the ABC dataset and thin-walled structure-specific datasets demonstrate that STAR-Edge outperforms existing edge detection methods, showcasing better robustness under various challenging conditions.
△ Less
Submitted 2 March, 2025;
originally announced March 2025.
-
GenPC: Zero-shot Point Cloud Completion via 3D Generative Priors
Authors:
An Li,
Zhe Zhu,
Mingqiang Wei
Abstract:
Existing point cloud completion methods, which typically depend on predefined synthetic training datasets, encounter significant challenges when applied to out-of-distribution, real-world scans. To overcome this limitation, we introduce a zero-shot completion framework, termed GenPC, designed to reconstruct high-quality real-world scans by leveraging explicit 3D generative priors. Our key insight…
▽ More
Existing point cloud completion methods, which typically depend on predefined synthetic training datasets, encounter significant challenges when applied to out-of-distribution, real-world scans. To overcome this limitation, we introduce a zero-shot completion framework, termed GenPC, designed to reconstruct high-quality real-world scans by leveraging explicit 3D generative priors. Our key insight is that recent feed-forward 3D generative models, trained on extensive internet-scale data, have demonstrated the ability to perform 3D generation from single-view images in a zero-shot setting. To harness this for completion, we first develop a Depth Prompting module that links partial point clouds with image-to-3D generative models by leveraging depth images as a stepping stone. To retain the original partial structure in the final results, we design the Geometric Preserving Fusion module that aligns the generated shape with input by adaptively adjusting its pose and scale. Extensive experiments on widely used benchmarks validate the superiority and generalizability of our approach, bringing us a step closer to robust real-world scan completion.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
PointSea: Point Cloud Completion via Self-structure Augmentation
Authors:
Zhe Zhu,
Honghua Chen,
Xing He,
Mingqiang Wei
Abstract:
Point cloud completion is a fundamental yet not well-solved problem in 3D vision. Current approaches often rely on 3D coordinate information and/or additional data (e.g., images and scanning viewpoints) to fill in missing parts. Unlike these methods, we explore self-structure augmentation and propose PointSea for global-to-local point cloud completion. In the global stage, consider how we inspect…
▽ More
Point cloud completion is a fundamental yet not well-solved problem in 3D vision. Current approaches often rely on 3D coordinate information and/or additional data (e.g., images and scanning viewpoints) to fill in missing parts. Unlike these methods, we explore self-structure augmentation and propose PointSea for global-to-local point cloud completion. In the global stage, consider how we inspect a defective region of a physical object, we may observe it from various perspectives for a better understanding. Inspired by this, PointSea augments data representation by leveraging self-projected depth images from multiple views. To reconstruct a compact global shape from the cross-modal input, we incorporate a feature fusion module to fuse features at both intra-view and inter-view levels. In the local stage, to reveal highly detailed structures, we introduce a point generator called the self-structure dual-generator. This generator integrates both learned shape priors and geometric self-similarities for shape refinement. Unlike existing efforts that apply a unified strategy for all points, our dual-path design adapts refinement strategies conditioned on the structural type of each point, addressing the specific incompleteness of each point. Comprehensive experiments on widely-used benchmarks demonstrate that PointSea effectively understands global shapes and generates local details from incomplete input, showing clear improvements over existing methods.
△ Less
Submitted 26 February, 2025; v1 submitted 24 February, 2025;
originally announced February 2025.
-
Ultra-high-energy $γ$-ray emission associated with the tail of a bow-shock pulsar wind nebula
Authors:
Zhen Cao,
F. Aharonian,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
W. Bian,
A. V. Bukevich,
C. M. Cai,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
H. X. Chen,
Liang Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. Chen,
S. H. Chen,
S. Z. Chen
, et al. (274 additional authors not shown)
Abstract:
In this study, we present a comprehensive analysis of an unidentified point-like ultra-high-energy (UHE) $γ$-ray source, designated as 1LHAASO J1740+0948u, situated in the vicinity of the middle-aged pulsar PSR J1740+1000. The detection significance reached 17.1$σ$ (9.4$σ$) above 25$\,$TeV (100$\,$TeV). The source energy spectrum extended up to 300$\,$TeV, which was well fitted by a log-parabola f…
▽ More
In this study, we present a comprehensive analysis of an unidentified point-like ultra-high-energy (UHE) $γ$-ray source, designated as 1LHAASO J1740+0948u, situated in the vicinity of the middle-aged pulsar PSR J1740+1000. The detection significance reached 17.1$σ$ (9.4$σ$) above 25$\,$TeV (100$\,$TeV). The source energy spectrum extended up to 300$\,$TeV, which was well fitted by a log-parabola function with $N0 = (1.93\pm0.23) \times 10^{-16} \rm{TeV^{-1}\,cm^{-2}\,s^{-2}}$, $α= 2.14\pm0.27$, and $β= 1.20\pm0.41$ at E0 = 30$\,$TeV. The associated pulsar, PSR J1740+1000, resides at a high galactic latitude and powers a bow-shock pulsar wind nebula (BSPWN) with an extended X-ray tail. The best-fit position of the gamma-ray source appeared to be shifted by $0.2^{\circ}$ with respect to the pulsar position. As the (i) currently identified pulsar halos do not demonstrate such offsets, and (ii) centroid of the gamma-ray emission is approximately located at the extension of the X-ray tail, we speculate that the UHE $γ$-ray emission may originate from re-accelerated electron/positron pairs that are advected away in the bow-shock tail.
△ Less
Submitted 24 February, 2025; v1 submitted 21 February, 2025;
originally announced February 2025.
-
General method for calculating transport properties of disordered mesoscopic systems based on the nonequilibrium Green's function formalism
Authors:
Gaoyang Li,
MiaoMiao Wei,
Fuming Xu,
Jian Wang
Abstract:
Disorder scattering plays important roles in quantum transport as well as various Hall effects, including the second-order nonlinear Hall effect induced by Berry curvature dipole. Calculation of disorder-averaged transport properties usually requires substantial computational resources, especially for higher-order effects. Existing methods are either limited by approximation conditions or constrai…
▽ More
Disorder scattering plays important roles in quantum transport as well as various Hall effects, including the second-order nonlinear Hall effect induced by Berry curvature dipole. Calculation of disorder-averaged transport properties usually requires substantial computational resources, especially for higher-order effects. Existing methods are either limited by approximation conditions or constrained by numerical stability, making it difficult to conveniently obtain average physical quantities over a wide range of disorder strength. In this work, we develop a general method for noninteracting system to obtain analytical expressions of disorder averages in finite orders of disorder strength. This method utilizes the Dyson equation to expand physical quantities expressed in terms of the Green's functions into series of disorder-averaged matrices, and the only approximation involved is the truncation of the Dyson equation. Therefore, this method not only avoids the brute force calculation of disorder samples, but also widely applies to different model systems, types of disorder, and the number of Green's functions in the expressions. We demonstrate the applicability of this general method by calculating averages of the linear conductance of a two-terminal system, the spin Hall conductance and the second-order nonlinear conductance of four-terminal Hall setups. It is found that truncation at the fourth order of disorder strength provides a reasonable accuracy and a convenient Padé treatment effectively extends its applicable range. Numerical results also confirms disorder enhancement of the second-order nonlinear Hall current in four-terminal systems. Moreover, more accurate predictions for a broader range of disorder strength can be achieved by including higher-order terms in a similar manner.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
Spin separation and filtering assisted by topological corner states in the Kekulé lattice
Authors:
Kai-Tong Wang,
Hui Wang,
Shijie Liu,
Miaomiao Wei,
Fuming Xu
Abstract:
Higher-order topological corner states have been realized in two-dimensional Kekulé lattice, which can be further coupled with spin polarization through the implementation of local magnetization. In this work, we numerically investigate the spin-dependent transport properties assisted by topological corner states in the Kekulé lattice. By applying local magnetization and electric potential, the to…
▽ More
Higher-order topological corner states have been realized in two-dimensional Kekulé lattice, which can be further coupled with spin polarization through the implementation of local magnetization. In this work, we numerically investigate the spin-dependent transport properties assisted by topological corner states in the Kekulé lattice. By applying local magnetization and electric potential, the topological corner states are spin polarized with opposite spins localized at different corners, thereby demonstrating a spin-corner state locking mechanism. Transport characteristics, including transmission, local density of states, and local current density, are calculated for a two-terminal setup consisting of a diamond-shaped Kekulé lattice connected to two leads. When opposite local magnetization is applied to the corners, spin-up and spin-down electrons are perfectly separated, forming two spin-polarized conducting channels and leading to spin spatial separation. In the presence of identical local magnetization on both corners and an electric potential at one corner, the spin-polarized corner states can facilitate selective filtering of different spins and generate spin-polarized currents by tuning the energy. Furthermore, spin-resolved transmission diagrams as functions of both the Fermi energy and electric potential are presented, illustrating the global distribution of spin filtering through topological corner states.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
Broadband $γ$-ray spectrum of supernova remnant Cassiopeia A
Authors:
Zhen Cao,
F. Aharonian,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
W. Bian,
A. V. Bukevich,
C. M. Cai,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
H. X. Chen,
Liang Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. Chen,
S. H. Chen,
S. Z. Chen
, et al. (293 additional authors not shown)
Abstract:
The core-collapse supernova remnant (SNR) Cassiopeia A (Cas A) is one of the brightest galactic radio sources with an angular radius of $\sim$ 2.5 $\arcmin$. Although no extension of this source has been detected in the $γ$-ray band, using more than 1000 days of LHAASO data above $\sim 0.8$ TeV, we find that its spectrum is significantly softer than those obtained with Imaging Air Cherenkov Telesc…
▽ More
The core-collapse supernova remnant (SNR) Cassiopeia A (Cas A) is one of the brightest galactic radio sources with an angular radius of $\sim$ 2.5 $\arcmin$. Although no extension of this source has been detected in the $γ$-ray band, using more than 1000 days of LHAASO data above $\sim 0.8$ TeV, we find that its spectrum is significantly softer than those obtained with Imaging Air Cherenkov Telescopes (IACTs) and its flux near $\sim 1$ TeV is about two times higher. In combination with analyses of more than 16 years of \textit{Fermi}-LAT data covering $0.1 \, \mathrm{GeV} - 1 \, \mathrm{TeV}$, we find that the spectrum above 30 GeV deviates significantly from a single power-law, and is best described by a smoothly broken power-law with a spectral index of $1.90 \pm 0.15_\mathrm{stat}$ ($3.41 \pm 0.19_\mathrm{stat}$) below (above) a break energy of $0.63 \pm 0.21_\mathrm{stat} \, \mathrm{TeV}$. Given differences in the angular resolution of LHAASO-WCDA and IACTs, TeV $γ$-ray emission detected with LHAASO may have a significant contribution from regions surrounding the SNR illuminated by particles accelerated earlier, which, however, are treated as background by IACTs. Detailed modelling can be used to constrain acceleration processes of TeV particles in the early stage of SNR evolution.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
MapFusion: A Novel BEV Feature Fusion Network for Multi-modal Map Construction
Authors:
Xiaoshuai Hao,
Yunfeng Diao,
Mengchuan Wei,
Yifan Yang,
Peng Hao,
Rong Yin,
Hui Zhang,
Weiming Li,
Shu Zhao,
Yu Liu
Abstract:
Map construction task plays a vital role in providing precise and comprehensive static environmental information essential for autonomous driving systems. Primary sensors include cameras and LiDAR, with configurations varying between camera-only, LiDAR-only, or camera-LiDAR fusion, based on cost-performance considerations. While fusion-based methods typically perform best, existing approaches ofte…
▽ More
Map construction task plays a vital role in providing precise and comprehensive static environmental information essential for autonomous driving systems. Primary sensors include cameras and LiDAR, with configurations varying between camera-only, LiDAR-only, or camera-LiDAR fusion, based on cost-performance considerations. While fusion-based methods typically perform best, existing approaches often neglect modality interaction and rely on simple fusion strategies, which suffer from the problems of misalignment and information loss. To address these issues, we propose MapFusion, a novel multi-modal Bird's-Eye View (BEV) feature fusion method for map construction. Specifically, to solve the semantic misalignment problem between camera and LiDAR BEV features, we introduce the Cross-modal Interaction Transform (CIT) module, enabling interaction between two BEV feature spaces and enhancing feature representation through a self-attention mechanism. Additionally, we propose an effective Dual Dynamic Fusion (DDF) module to adaptively select valuable information from different modalities, which can take full advantage of the inherent information between different modalities. Moreover, MapFusion is designed to be simple and plug-and-play, easily integrated into existing pipelines. We evaluate MapFusion on two map construction tasks, including High-definition (HD) map and BEV map segmentation, to show its versatility and effectiveness. Compared with the state-of-the-art methods, MapFusion achieves 3.6% and 6.2% absolute improvements on the HD map construction and BEV map segmentation tasks on the nuScenes dataset, respectively, demonstrating the superiority of our approach.
△ Less
Submitted 5 February, 2025;
originally announced February 2025.
-
Towards Consistent and Controllable Image Synthesis for Face Editing
Authors:
Mengting Wei,
Tuomas Varanka,
Yante Li,
Xingxun Jiang,
Huai-Qian Khor,
Guoying Zhao
Abstract:
Face editing methods, essential for tasks like virtual avatars, digital human synthesis and identity preservation, have traditionally been built upon GAN-based techniques, while recent focus has shifted to diffusion-based models due to their success in image reconstruction. However, diffusion models still face challenges in controlling specific attributes and preserving the consistency of other un…
▽ More
Face editing methods, essential for tasks like virtual avatars, digital human synthesis and identity preservation, have traditionally been built upon GAN-based techniques, while recent focus has shifted to diffusion-based models due to their success in image reconstruction. However, diffusion models still face challenges in controlling specific attributes and preserving the consistency of other unchanged attributes especially the identity characteristics. To address these issues and facilitate more convenient editing of face images, we propose a novel approach that leverages the power of Stable-Diffusion (SD) models and crude 3D face models to control the lighting, facial expression and head pose of a portrait photo. We observe that this task essentially involves the combinations of target background, identity and face attributes aimed to edit. We strive to sufficiently disentangle the control of these factors to enable consistency of face editing. Specifically, our method, coined as RigFace, contains: 1) A Spatial Attribute Encoder that provides presise and decoupled conditions of background, pose, expression and lighting; 2) A high-consistency FaceFusion method that transfers identity features from the Identity Encoder to the denoising UNet of a pre-trained SD model; 3) An Attribute Rigger that injects those conditions into the denoising UNet. Our model achieves comparable or even superior performance in both identity preservation and photorealism compared to existing face editing models. Code is publicly available at https://github.com/weimengting/RigFace.
△ Less
Submitted 9 February, 2025; v1 submitted 4 February, 2025;
originally announced February 2025.
-
Universal Abstraction: Harnessing Frontier Models to Structure Real-World Data at Scale
Authors:
Cliff Wong,
Sam Preston,
Qianchu Liu,
Zelalem Gero,
Jass Bagga,
Sheng Zhang,
Shrey Jain,
Theodore Zhao,
Yu Gu,
Yanbo Xu,
Sid Kiblawi,
Roshanthi Weerasinghe,
Rom Leidner,
Kristina Young,
Brian Piening,
Carlo Bifulco,
Tristan Naumann,
Mu Wei,
Hoifung Poon
Abstract:
The vast majority of real-world patient information resides in unstructured clinical text, and the process of medical abstraction seeks to extract and normalize structured information from this unstructured input. However, traditional medical abstraction methods can require significant manual efforts that can include crafting rules or annotating training labels, limiting scalability. In this paper…
▽ More
The vast majority of real-world patient information resides in unstructured clinical text, and the process of medical abstraction seeks to extract and normalize structured information from this unstructured input. However, traditional medical abstraction methods can require significant manual efforts that can include crafting rules or annotating training labels, limiting scalability. In this paper, we propose UniMedAbstractor (UMA), a zero-shot medical abstraction framework leveraging Large Language Models (LLMs) through a modular and customizable prompt template. We refer to our approach as universal abstraction as it can quickly scale to new attributes through its universal prompt template without curating attribute-specific training labels or rules. We evaluate UMA for oncology applications, focusing on fifteen key attributes representing the cancer patient journey, from short-context attributes (e.g., performance status, treatment) to complex long-context attributes requiring longitudinal reasoning (e.g., tumor site, histology, TNM staging). Experiments on real-world data show UMA's strong performance and generalizability. Compared to supervised and heuristic baselines, UMA with GPT-4o achieves on average an absolute 2-point F1/accuracy improvement for both short-context and long-context attribute abstraction. For pathologic T staging, UMA even outperforms the supervised model by 20 points in accuracy.
△ Less
Submitted 2 February, 2025;
originally announced February 2025.
-
Constructing AI ethics narratives based on real-world data: Human-AI collaboration in data-driven visual storytelling
Authors:
Mengyi Wei,
Chenjing Jiao,
Chenyu Zuo,
Lorenz Hurni,
Liqiu Meng
Abstract:
AI ethics narratives have the potential to shape the public accurate understanding of AI technologies and promote communication among different stakeholders. However, AI ethics narratives are largely lacking. Existing limited narratives tend to center on works of science fiction or corporate marketing campaigns of large technology companies. Misuse of "socio-technical imaginary" can blur the line…
▽ More
AI ethics narratives have the potential to shape the public accurate understanding of AI technologies and promote communication among different stakeholders. However, AI ethics narratives are largely lacking. Existing limited narratives tend to center on works of science fiction or corporate marketing campaigns of large technology companies. Misuse of "socio-technical imaginary" can blur the line between speculation and reality for the public, undermining the responsibility and regulation of technology development. Therefore, constructing authentic AI ethics narratives is an urgent task. The emergence of generative AI offers new possibilities for building narrative systems. This study is dedicated to data-driven visual storytelling about AI ethics relying on the human-AI collaboration. Based on the five key elements of story models, we proposed a conceptual framework for human-AI collaboration, explored the roles of generative AI and humans in the creation of visual stories. We implemented the conceptual framework in a real AI news case. This research leveraged advanced generative AI technologies to provide a reference for constructing genuine AI ethics narratives. Our goal is to promote active public engagement and discussions through authentic AI ethics narratives, thereby contributing to the development of better AI policies.
△ Less
Submitted 1 February, 2025;
originally announced February 2025.
-
Resilient Endurance-Aware NVM-based PUF against Learning-based Attacks
Authors:
Hassan Nassar,
Ming-Liang Wei,
Chia-Lin Yang,
Jörg Henkel,
Kuan-Hsun Chen
Abstract:
Physical Unclonable Functions (PUFs) based on Non-Volatile Memory (NVM) technology have emerged as a promising solution for secure authentication and cryptographic applications. By leveraging the multi-level cell (MLC) characteristic of NVMs, these PUFs can generate a wide range of unique responses, enhancing their resilience to machine learning (ML) modeling attacks. However, a significant issue…
▽ More
Physical Unclonable Functions (PUFs) based on Non-Volatile Memory (NVM) technology have emerged as a promising solution for secure authentication and cryptographic applications. By leveraging the multi-level cell (MLC) characteristic of NVMs, these PUFs can generate a wide range of unique responses, enhancing their resilience to machine learning (ML) modeling attacks. However, a significant issue with NVM-based PUFs is their endurance problem; frequent write operations lead to wear and degradation over time, reducing the reliability and lifespan of the PUF.
This paper addresses these issues by offering a comprehensive model to predict and analyze the effects of endurance changes on NVM PUFs. This model provides insights into how wear impacts the PUF's quality and helps in designing more robust PUFs. Building on this model, we present a novel design for NVM PUFs that significantly improves endurance. Our design approach incorporates advanced techniques to distribute write operations more evenly and reduce stress on individual cells. The result is an NVM PUF that demonstrates a $62\times$ improvement in endurance compared to current state-of-the-art solutions while maintaining protection against learning-based attacks.
△ Less
Submitted 10 January, 2025;
originally announced January 2025.
-
Mapping Transient Structures of Cyclo[18]Carbon by Computational X-Ray Spectra
Authors:
Minrui Wei,
Sheng-Yu Wang,
Jun-Rong Zhang,
Lu Zhang,
Guoyan Ge,
Zeyu Liu,
Weijie Hua
Abstract:
The structure of cyclo[18]carbon (C$_{18}$), whether in its polyynic form with bond length alternation (BLA) or its cumulenic form without BLA, has long fascinated researchers, even prior to its successful synthesis. Recent studies suggest a polyynic ground state and a cumulenic transient state; however, the dynamics remain unclear and lack experimental validation. This study presents a first-prin…
▽ More
The structure of cyclo[18]carbon (C$_{18}$), whether in its polyynic form with bond length alternation (BLA) or its cumulenic form without BLA, has long fascinated researchers, even prior to its successful synthesis. Recent studies suggest a polyynic ground state and a cumulenic transient state; however, the dynamics remain unclear and lack experimental validation. This study presents a first-principles theoretical investigation of the bond lengths ($R_1$ and $R_2$) dependent two-dimensional potential energy surfaces (PESs) of C$_{18}$, concentrating on the ground state and carbon 1s ionized and excited states. We examine the potential of X-ray spectra for determining bond lengths and monitoring transient structures, finding that both X-ray photoelectron (XPS) and absorption (XAS) spectra are sensitive to these variations. Utilizing a library of ground-state minimum structures optimized with 14 different functionals, we observe that core binding energies predicted with the $ω$B97XD functional can vary by 0.9 eV (290.3--291.2 eV). Unlike the ground state PES, which predicts minima at alternating bond lengths, the C1s ionized state PES predicts minima with equivalent bond lengths. In the XAS spectra, peaks 1$π^*$ and 2$π^*$ show a redshift with increasing bond lengths along the line where $R_1 = R_2$. Additionally, increasing $R_2$ (with $R_1$ fixed) results in an initial redshift followed by a blueshift, minimizing at $R_1 = R_2$. Major peaks indicate that both 1$π^*$ and 2$π^*$ arise from two channels: C1s$\rightarrowπ^*_{z}$ (out-of-plane) and C1s$\rightarrowπ^*_{xy}$ (in-plane) transitions at coinciding energies.
△ Less
Submitted 5 January, 2025;
originally announced January 2025.
-
MagicFace: High-Fidelity Facial Expression Editing with Action-Unit Control
Authors:
Mengting Wei,
Tuomas Varanka,
Xingxun Jiang,
Huai-Qian Khor,
Guoying Zhao
Abstract:
We address the problem of facial expression editing by controling the relative variation of facial action-unit (AU) from the same person. This enables us to edit this specific person's expression in a fine-grained, continuous and interpretable manner, while preserving their identity, pose, background and detailed facial attributes. Key to our model, which we dub MagicFace, is a diffusion model con…
▽ More
We address the problem of facial expression editing by controling the relative variation of facial action-unit (AU) from the same person. This enables us to edit this specific person's expression in a fine-grained, continuous and interpretable manner, while preserving their identity, pose, background and detailed facial attributes. Key to our model, which we dub MagicFace, is a diffusion model conditioned on AU variations and an ID encoder to preserve facial details of high consistency. Specifically, to preserve the facial details with the input identity, we leverage the power of pretrained Stable-Diffusion models and design an ID encoder to merge appearance features through self-attention. To keep background and pose consistency, we introduce an efficient Attribute Controller by explicitly informing the model of current background and pose of the target. By injecting AU variations into a denoising UNet, our model can animate arbitrary identities with various AU combinations, yielding superior results in high-fidelity expression editing compared to other facial expression editing works. Code is publicly available at https://github.com/weimengting/MagicFace.
△ Less
Submitted 9 January, 2025; v1 submitted 4 January, 2025;
originally announced January 2025.
-
Enhanced Atom-by-Atom Assembly of Defect-Free Two-Dimensional Mixed-Species Atomic Arrays
Authors:
Ming-Rui Wei,
Kun-Peng Wang,
Jia-Yi Hou,
Yi Chen,
Peng Xu,
Jun Zhuang,
Rui-Jun Guo,
Min Liu,
Jin Wang,
Xiao-Dong He,
Ming-Sheng Zhan
Abstract:
Defect-free single atom array in optical tweezers is a promising platform for scalable quantum computing, quantum simulation, and quantum metrology. Extending single-species array to mixed-species one promise to offer new possibilities. In our recent proof of principle realization of defect-free two-dimensional assembly of mixed-species $^{85}$Rb ($^{87}$Rb) atom arrays [C. Sheng et al.\href{https…
▽ More
Defect-free single atom array in optical tweezers is a promising platform for scalable quantum computing, quantum simulation, and quantum metrology. Extending single-species array to mixed-species one promise to offer new possibilities. In our recent proof of principle realization of defect-free two-dimensional assembly of mixed-species $^{85}$Rb ($^{87}$Rb) atom arrays [C. Sheng et al.\href{https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.128.083202}{{\color{blue} Phys. Rev. Lett. 128, 083202(2022)}}], the filling fractions were limited by the imperfect transfer of atoms and the occurrence of logjams during the atom rearrangement. In order to scale up the size of defect-free mixed-species atom array, we scale up the tweezer array and improve the atom transfer, and upgrade the heuristic heteronuclear algorithm so as to facilitate multiple rearrangement cycles. Consequently, we successfully create defect-free atom arrays with 120 mixed-species single atoms. The corresponding filling fraction and defect-free probability are improved to be 98.6(1)\% and 14(2)\%, respectively. It is anticipated that the enhanced algorithm can be extended to other combinations of atomic species, and this mixed-species atom array is readily for studies of many-body physics, quantum error correction, and quantum metrology.
△ Less
Submitted 9 January, 2025; v1 submitted 4 January, 2025;
originally announced January 2025.
-
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration
Authors:
Jianyi Wang,
Zhijie Lin,
Meng Wei,
Yang Zhao,
Ceyuan Yang,
Fei Xiao,
Chen Change Loy,
Lu Jiang
Abstract:
Video restoration poses non-trivial challenges in maintaining fidelity while recovering temporally consistent details from unknown degradations in the wild. Despite recent advances in diffusion-based restoration, these methods often face limitations in generation capability and sampling efficiency. In this work, we present SeedVR, a diffusion transformer designed to handle real-world video restora…
▽ More
Video restoration poses non-trivial challenges in maintaining fidelity while recovering temporally consistent details from unknown degradations in the wild. Despite recent advances in diffusion-based restoration, these methods often face limitations in generation capability and sampling efficiency. In this work, we present SeedVR, a diffusion transformer designed to handle real-world video restoration with arbitrary length and resolution. The core design of SeedVR lies in the shifted window attention that facilitates effective restoration on long video sequences. SeedVR further supports variable-sized windows near the boundary of both spatial and temporal dimensions, overcoming the resolution constraints of traditional window attention. Equipped with contemporary practices, including causal video autoencoder, mixed image and video training, and progressive training, SeedVR achieves highly-competitive performance on both synthetic and real-world benchmarks, as well as AI-generated videos. Extensive experiments demonstrate SeedVR's superiority over existing methods for generic video restoration.
△ Less
Submitted 22 March, 2025; v1 submitted 2 January, 2025;
originally announced January 2025.
-
MSC-Bench: Benchmarking and Analyzing Multi-Sensor Corruption for Driving Perception
Authors:
Xiaoshuai Hao,
Guanqun Liu,
Yuting Zhao,
Yuheng Ji,
Mengchuan Wei,
Haimei Zhao,
Lingdong Kong,
Rong Yin,
Yu Liu
Abstract:
Multi-sensor fusion models play a crucial role in autonomous driving perception, particularly in tasks like 3D object detection and HD map construction. These models provide essential and comprehensive static environmental information for autonomous driving systems. While camera-LiDAR fusion methods have shown promising results by integrating data from both modalities, they often depend on complet…
▽ More
Multi-sensor fusion models play a crucial role in autonomous driving perception, particularly in tasks like 3D object detection and HD map construction. These models provide essential and comprehensive static environmental information for autonomous driving systems. While camera-LiDAR fusion methods have shown promising results by integrating data from both modalities, they often depend on complete sensor inputs. This reliance can lead to low robustness and potential failures when sensors are corrupted or missing, raising significant safety concerns. To tackle this challenge, we introduce the Multi-Sensor Corruption Benchmark (MSC-Bench), the first comprehensive benchmark aimed at evaluating the robustness of multi-sensor autonomous driving perception models against various sensor corruptions. Our benchmark includes 16 combinations of corruption types that disrupt both camera and LiDAR inputs, either individually or concurrently. Extensive evaluations of six 3D object detection models and four HD map construction models reveal substantial performance degradation under adverse weather conditions and sensor failures, underscoring critical safety issues. The benchmark toolkit and affiliated code and model checkpoints have been made publicly accessible.
△ Less
Submitted 1 January, 2025;
originally announced January 2025.
-
Improving Acoustic Scene Classification in Low-Resource Conditions
Authors:
Zhi Chen,
Yun-Fei Shao,
Yong Ma,
Mingsheng Wei,
Le Zhang,
Wei-Qiang Zhang
Abstract:
Acoustic Scene Classification (ASC) identifies an environment based on an audio signal. This paper explores ASC in low-resource conditions and proposes a novel model, DS-FlexiNet, which combines depthwise separable convolutions from MobileNetV2 with ResNet-inspired residual connections for a balance of efficiency and accuracy. To address hardware limitations and device heterogeneity, DS-FlexiNet e…
▽ More
Acoustic Scene Classification (ASC) identifies an environment based on an audio signal. This paper explores ASC in low-resource conditions and proposes a novel model, DS-FlexiNet, which combines depthwise separable convolutions from MobileNetV2 with ResNet-inspired residual connections for a balance of efficiency and accuracy. To address hardware limitations and device heterogeneity, DS-FlexiNet employs Quantization Aware Training (QAT) for model compression and data augmentation methods like Auto Device Impulse Response (ADIR) and Freq-MixStyle (FMS) to improve cross-device generalization. Knowledge Distillation (KD) from twelve teacher models further enhances performance on unseen devices. The architecture includes a custom Residual Normalization layer to handle domain differences across devices, and depthwise separable convolutions reduce computational overhead without sacrificing feature representation. Experimental results show that DS-FlexiNet excels in both adaptability and performance under resource-constrained conditions.
△ Less
Submitted 27 April, 2025; v1 submitted 30 December, 2024;
originally announced December 2024.
-
Predicting Accurate X-ray Absorption Spectra for CN$^+$, CN, and CN$^-$: Insights from Multiconfigurational and Density Functional Simulations
Authors:
Jinyu Li,
Sheng-Yu Wang,
Lu Zhang,
Guoyan Ge,
Minrui Wei,
Junxiang Zuo,
Weijie Hua
Abstract:
High-resolution X-ray spectroscopy is an essential tool in X-ray astronomy, enabling detailed studies of celestial objects and their physical and chemical properties. However, comprehensive mapping of high-resolution X-ray spectra for even simple interstellar and circumstellar molecules is still lacking. In this study, we conducted systematic quantum chemical simulations to predict the C1s X-ray a…
▽ More
High-resolution X-ray spectroscopy is an essential tool in X-ray astronomy, enabling detailed studies of celestial objects and their physical and chemical properties. However, comprehensive mapping of high-resolution X-ray spectra for even simple interstellar and circumstellar molecules is still lacking. In this study, we conducted systematic quantum chemical simulations to predict the C1s X-ray absorption spectra of CN$^+$, CN, and CN$^-$. Our findings provide valuable references for both X-ray astronomy and laboratory studies. We assigned the first electronic peak of CN$^+$ and CN to C1s $\rightarrow σ^*$ transitions, while the peak for CN$^-$ corresponds to a C1s $\rightarrow π^*$ transition. We explained that the two-fold degeneracy ($π^*_{xz}$ and $π^*_{yz}$) of the C1s$\rightarrowπ^*$ transitions results in CN$^-$ exhibiting a significantly stronger first absorption compared to the other two systems. We further calculated the vibronic fine structures for these transitions using the quantum wavepacket method based on multiconfigurational-level, anharmonic potential energy curves, revealing distinct energy positions for the 0-0 absorptions at 280.7 eV, 279.6 eV, and 285.8 eV. Each vibronic profile features a prominent 0-0 peak, showing overall similarity but differing intensity ratios of the 0-0 and 0-1 peaks. Notably, introducing a C1s core hole leads to shortened C-N bond lengths and increased vibrational frequencies across all species. These findings enhance our understanding of the electronic structures and X-ray spectra of carbon-nitrogen species, emphasizing the influence of charge state on X-ray absorptions.
△ Less
Submitted 27 March, 2025; v1 submitted 26 December, 2024;
originally announced December 2024.
-
Efficient MedSAMs: Segment Anything in Medical Images on Laptop
Authors:
Jun Ma,
Feifei Li,
Sumin Kim,
Reza Asakereh,
Bao-Hiep Le,
Dang-Khoa Nguyen-Vu,
Alexander Pfefferle,
Muxin Wei,
Ruochen Gao,
Donghang Lyu,
Songxiao Yang,
Lennart Purucker,
Zdravko Marinov,
Marius Staring,
Haisheng Lu,
Thuy Thanh Dao,
Xincheng Ye,
Zhi Li,
Gianluca Brugnara,
Philipp Vollmuth,
Martha Foltyn-Dumitru,
Jaeyoung Cho,
Mustafa Ahmed Mahmutoglu,
Martin Bendszus,
Irada Pflüger
, et al. (57 additional authors not shown)
Abstract:
Promptable segmentation foundation models have emerged as a transformative approach to addressing the diverse needs in medical images, but most existing models require expensive computing, posing a big barrier to their adoption in clinical practice. In this work, we organized the first international competition dedicated to promptable medical image segmentation, featuring a large-scale dataset spa…
▽ More
Promptable segmentation foundation models have emerged as a transformative approach to addressing the diverse needs in medical images, but most existing models require expensive computing, posing a big barrier to their adoption in clinical practice. In this work, we organized the first international competition dedicated to promptable medical image segmentation, featuring a large-scale dataset spanning nine common imaging modalities from over 20 different institutions. The top teams developed lightweight segmentation foundation models and implemented an efficient inference pipeline that substantially reduced computational requirements while maintaining state-of-the-art segmentation accuracy. Moreover, the post-challenge phase advanced the algorithms through the design of performance booster and reproducibility tasks, resulting in improved algorithms and validated reproducibility of the winning solution. Furthermore, the best-performing algorithms have been incorporated into the open-source software with a user-friendly interface to facilitate clinical adoption. The data and code are publicly available to foster the further development of medical image segmentation foundation models and pave the way for impactful real-world applications.
△ Less
Submitted 20 December, 2024;
originally announced December 2024.
-
UITrans: Seamless UI Translation from Android to HarmonyOS
Authors:
Lina Gong,
Chen Wang,
Yujun Huang,
Di Cui,
Mingqiang Wei
Abstract:
Seamless user interface (i.e., UI) translation has emerged as a pivotal technique for modern mobile developers, addressing the challenge of developing separate UI applications for Android and HarmonyOS platforms due to fundamental differences in layout structures and development paradigms. In this paper, we present UITrans, the first automated UI translation tool designed for Android to HarmonyOS.…
▽ More
Seamless user interface (i.e., UI) translation has emerged as a pivotal technique for modern mobile developers, addressing the challenge of developing separate UI applications for Android and HarmonyOS platforms due to fundamental differences in layout structures and development paradigms. In this paper, we present UITrans, the first automated UI translation tool designed for Android to HarmonyOS. UITrans leverages an LLM-driven multi-agent reflective collaboration framework to convert Android XML layouts into HarmonyOS ArkUI layouts. It not only maps component-level and page-level elements to ArkUI equivalents but also handles project-level challenges, including complex layouts and interaction logic. Our evaluation of six Android applications demonstrates that our UITrans achieves translation success rates of over 90.1%, 89.3%, and 89.2% at the component, page, and project levels, respectively. UITrans is available at https://github.com/OpenSELab/UITrans and the demo video can be viewed at https://www.youtube.com/watch?v=iqKOSmCnJG0.
△ Less
Submitted 5 February, 2025; v1 submitted 18 December, 2024;
originally announced December 2024.
-
Gradual Vigilance and Interval Communication: Enhancing Value Alignment in Multi-Agent Debates
Authors:
Rui Zou,
Mengqi Wei,
Jintian Feng,
Qian Wan,
Jianwen Sun,
Sannyuya Liu
Abstract:
In recent years, large language models have shown exceptional performance in fulfilling diverse human needs. However, their training data can introduce harmful content, underscoring the necessity for robust value alignment. Mainstream methods, which depend on feedback learning and supervised training, are resource-intensive and may constrain the full potential of the models. Multi-Agent Debate (MA…
▽ More
In recent years, large language models have shown exceptional performance in fulfilling diverse human needs. However, their training data can introduce harmful content, underscoring the necessity for robust value alignment. Mainstream methods, which depend on feedback learning and supervised training, are resource-intensive and may constrain the full potential of the models. Multi-Agent Debate (MAD) offers a more efficient and innovative solution by enabling the generation of reliable answers through agent interactions. To apply MAD to value alignment, we examine the relationship between the helpfulness and harmlessness of debate outcomes and individual responses, and propose a MAD based framework Gradual Vigilance and Interval Communication (GVIC). GVIC allows agents to assess risks with varying levels of vigilance and to exchange diverse information through interval communication. We theoretically prove that GVIC optimizes debate efficiency while reducing communication overhead. Experimental results demonstrate that GVIC consistently outperforms baseline methods across various tasks and datasets, particularly excelling in harmfulness mitigation and fraud prevention. Additionally, GVIC exhibits strong adaptability across different base model sizes, including both unaligned and aligned models, and across various task types.
△ Less
Submitted 17 December, 2024;
originally announced December 2024.
-
A Survey on Sequential Recommendation
Authors:
Liwei Pan,
Weike Pan,
Meiyan Wei,
Hongzhi Yin,
Zhong Ming
Abstract:
Different from most conventional recommendation problems, sequential recommendation focuses on learning users' preferences by exploiting the internal order and dependency among the interacted items, which has received significant attention from both researchers and practitioners. In recent years, we have witnessed great progress and achievements in this field, necessitating a new survey. In this s…
▽ More
Different from most conventional recommendation problems, sequential recommendation focuses on learning users' preferences by exploiting the internal order and dependency among the interacted items, which has received significant attention from both researchers and practitioners. In recent years, we have witnessed great progress and achievements in this field, necessitating a new survey. In this survey, we study the SR problem from a new perspective (i.e., the construction of an item's properties), and summarize the most recent techniques used in sequential recommendation such as pure ID-based SR, SR with side information, multi-modal SR, generative SR, LLM-powered SR, ultra-long SR and data-augmented SR. Moreover, we introduce some frontier research topics in sequential recommendation, e.g., open-domain SR, data-centric SR, could-edge collaborative SR, continuous SR, SR for good, and explainable SR. We believe that our survey could be served as a valuable roadmap for readers in this field.
△ Less
Submitted 13 March, 2025; v1 submitted 17 December, 2024;
originally announced December 2024.
-
Observation of a spectral hardening in cosmic ray boron spectrum with the DAMPE space mission
Authors:
DAMPE Collaboration,
F. Alemanno,
C. Altomare,
Q. An,
P. Azzarello,
F. C. T. Barbato,
P. Bernardini,
X. J. Bi,
H. Boutin,
I. Cagnoli,
M. S. Cai,
E. Casilli,
E. Catanzani,
J. Chang,
D. Y. Chen,
J. L. Chen,
Z. F. Chen,
Z. X. Chen,
P. Coppin,
M. Y. Cui,
T. S. Cui,
Y. X. Cui,
I. De Mitri,
F. de Palma,
A. Di Giovanni
, et al. (121 additional authors not shown)
Abstract:
Secondary cosmic ray fluxes are important probes of the propagation and interaction of high-energy particles in the Galaxy. Recent measurements of primary and secondary cosmic ray nuclei have revealed unexpected spectral features that demand a deeper understanding. In this work we report the direct measurement of the cosmic ray boron spectrum from 10 GeV/n to 8 TeV/n with eight years of data colle…
▽ More
Secondary cosmic ray fluxes are important probes of the propagation and interaction of high-energy particles in the Galaxy. Recent measurements of primary and secondary cosmic ray nuclei have revealed unexpected spectral features that demand a deeper understanding. In this work we report the direct measurement of the cosmic ray boron spectrum from 10 GeV/n to 8 TeV/n with eight years of data collected by the Dark Matter Particle Explorer (DAMPE) mission. The measured spectrum shows an evident hardening at $182\pm24$ GeV/n with a spectral power index of $γ_1 = 3.02 \pm 0.01$ before the break and an index change of $Δγ= 0.31 \pm 0.05$ after the break. A simple power law model is disfavored at a confidence level of 8$σ$. Compared with the hardenings measured in the DAMPE proton and helium spectra, the secondary boron spectrum hardens roughly twice as much as these primaries, which is consistent with a propagation related mechanism to interpret the spectral hardenings of cosmic rays observed at hundreds of GeV/n.
△ Less
Submitted 18 December, 2024; v1 submitted 16 December, 2024;
originally announced December 2024.
-
ESA: Example Sieve Approach for Multi-Positive and Unlabeled Learning
Authors:
Zhongnian Li,
Meng Wei,
Peng Ying,
Xinzheng Xu
Abstract:
Learning from Multi-Positive and Unlabeled (MPU) data has gradually attracted significant attention from practical applications. Unfortunately, the risk of MPU also suffer from the shift of minimum risk, particularly when the models are very flexible as shown in Fig.\ref{moti}. In this paper, to alleviate the shifting of minimum risk problem, we propose an Example Sieve Approach (ESA) to select ex…
▽ More
Learning from Multi-Positive and Unlabeled (MPU) data has gradually attracted significant attention from practical applications. Unfortunately, the risk of MPU also suffer from the shift of minimum risk, particularly when the models are very flexible as shown in Fig.\ref{moti}. In this paper, to alleviate the shifting of minimum risk problem, we propose an Example Sieve Approach (ESA) to select examples for training a multi-class classifier. Specifically, we sieve out some examples by utilizing the Certain Loss (CL) value of each example in the training stage and analyze the consistency of the proposed risk estimator. Besides, we show that the estimation error of proposed ESA obtains the optimal parametric convergence rate. Extensive experiments on various real-world datasets show the proposed approach outperforms previous methods.
△ Less
Submitted 3 December, 2024;
originally announced December 2024.
-
Learning from Concealed Labels
Authors:
Zhongnian Li,
Meng Wei,
Peng Ying,
Tongfeng Sun,
Xinzheng Xu
Abstract:
Annotating data for sensitive labels (e.g., disease, smoking) poses a potential threats to individual privacy in many real-world scenarios. To cope with this problem, we propose a novel setting to protect privacy of each instance, namely learning from concealed labels for multi-class classification. Concealed labels prevent sensitive labels from appearing in the label set during the label collecti…
▽ More
Annotating data for sensitive labels (e.g., disease, smoking) poses a potential threats to individual privacy in many real-world scenarios. To cope with this problem, we propose a novel setting to protect privacy of each instance, namely learning from concealed labels for multi-class classification. Concealed labels prevent sensitive labels from appearing in the label set during the label collection stage, which specifies none and some random sampled insensitive labels as concealed labels set to annotate sensitive data. In this paper, an unbiased estimator can be established from concealed data under mild assumptions, and the learned multi-class classifier can not only classify the instance from insensitive labels accurately but also recognize the instance from the sensitive labels. Moreover, we bound the estimation error and show that the multi-class classifier achieves the optimal parametric convergence rate. Experiments demonstrate the significance and effectiveness of the proposed method for concealed labels in synthetic and real-world datasets.
△ Less
Submitted 3 December, 2024;
originally announced December 2024.
-
Incentive-Driven Task Offloading and Collaborative Computing in Device-Assisted MEC Networks
Authors:
Yang Li,
Xing Zhang,
Bo Lei,
Qianying Zhao,
Min Wei,
Zheyan Qu,
Wenbo Wang
Abstract:
Edge computing (EC), positioned near end devices, holds significant potential for delivering low-latency, energy-efficient, and secure services. This makes it a crucial component of the Internet of Things (IoT). However, the increasing number of IoT devices and emerging services place tremendous pressure on edge servers (ESs). To better handle dynamically arriving heterogeneous tasks, ESs and IoT…
▽ More
Edge computing (EC), positioned near end devices, holds significant potential for delivering low-latency, energy-efficient, and secure services. This makes it a crucial component of the Internet of Things (IoT). However, the increasing number of IoT devices and emerging services place tremendous pressure on edge servers (ESs). To better handle dynamically arriving heterogeneous tasks, ESs and IoT devices with idle resources can collaborate in processing tasks. Considering the selfishness and heterogeneity of IoT devices and ESs, we propose an incentive-driven multi-level task allocation framework. Specifically, we categorize IoT devices into task IoT devices (TDs), which generate tasks, and auxiliary IoT devices (ADs), which have idle resources. We use a bargaining game to determine the initial offloading decision and the payment fee for each TD, as well as a double auction to incentivize ADs to participate in task processing. Additionally, we develop a priority-based inter-cell task scheduling algorithm to address the uneven distribution of user tasks across different cells. Finally, we theoretically analyze the performance of the proposed framework. Simulation results demonstrate that our proposed framework outperforms benchmark methods.
△ Less
Submitted 30 November, 2024;
originally announced December 2024.