Search | arXiv e-print repository

CLAMM: a spin CLuster expansion--Monte Carlo toolkit for Alloys and Magnetic Materials

Authors: Brian Blankenau, Tianyu Su, Namhoon Kim, Elif Ertekin

Abstract: Finite-temperature magnetism gives rise to many phenomena in alloy materials, such as magnetic phase transformations, short or medium range order in magnetic alloys, spin waves, critical phenomena, and the magnetocaloric effect. Lattice models, such as the Ising, Potts, cluster expansion, and magnetic cluster expansion models, are powerful tools for studying complex magnetic alloys and compounds.… ▽ More Finite-temperature magnetism gives rise to many phenomena in alloy materials, such as magnetic phase transformations, short or medium range order in magnetic alloys, spin waves, critical phenomena, and the magnetocaloric effect. Lattice models, such as the Ising, Potts, cluster expansion, and magnetic cluster expansion models, are powerful tools for studying complex magnetic alloys and compounds. In this paper we introduce CLAMM, which is a new open source toolkit for developing custom lattice models from density functional theory (DFT) data sets. The toolkit is comprised of three main components. The first component is CLAMM_Prep, a python tool that converts data sets consisting of the Vienna Ab-initio Simulation Package (VASP) DFT simulations into a compact format. The second component, CLAMM_Fit, is also python-based and uses the compact data set to parameterize a lattice model, chosen from a set of available options (cluster expansion, Ising, and others). The third component is CLAMM_MC, which is a C++ Monte Carlo solver for generating ensembles of configurations, accounting for both magnetic and alloy configurational entropies, at different temperatures. These ensembles and their analysis can be used for simulating phase transformations and constructing phase diagrams. The code can also be used for generating special quasi-random structures and structures with user-defined short-range order. This document provides a comprehensive overview of each CLAMM tool in order to demonstrate CLAMM's potential for the computational materials community. △ Less

Submitted 21 June, 2025; originally announced June 2025.

arXiv:2506.17088 [pdf, ps, other]

Chain-of-Thought Prompting Obscures Hallucination Cues in Large Language Models: An Empirical Evaluation

Authors: Jiahao Cheng, Tiancheng Su, Jia Yuan, Guoxiu He, Jiawei Liu, Xinqi Tao, Jingwen Xie, Huaxia Li

Abstract: Large Language Models (LLMs) often exhibit \textit{hallucinations}, generating factually incorrect or semantically irrelevant content in response to prompts. Chain-of-Thought (CoT) prompting can mitigate hallucinations by encouraging step-by-step reasoning, but its impact on hallucination detection remains underexplored. To bridge this gap, we conduct a systematic empirical evaluation. We begin wi… ▽ More Large Language Models (LLMs) often exhibit \textit{hallucinations}, generating factually incorrect or semantically irrelevant content in response to prompts. Chain-of-Thought (CoT) prompting can mitigate hallucinations by encouraging step-by-step reasoning, but its impact on hallucination detection remains underexplored. To bridge this gap, we conduct a systematic empirical evaluation. We begin with a pilot experiment, revealing that CoT reasoning significantly affects the LLM's internal states and token probability distributions. Building on this, we evaluate the impact of various CoT prompting methods on mainstream hallucination detection methods across both instruction-tuned and reasoning-oriented LLMs. Specifically, we examine three key dimensions: changes in hallucination score distributions, variations in detection accuracy, and shifts in detection confidence. Our findings show that while CoT prompting helps reduce hallucination frequency, it also tends to obscure critical signals used for detection, impairing the effectiveness of various detection methods. Our study highlights an overlooked trade-off in the use of reasoning. Code is publicly available at: https://anonymous.4open.science/r/cot-hallu-detect. △ Less

Submitted 20 June, 2025; originally announced June 2025.

arXiv:2506.06710 [pdf, ps, other]

A Systematic Investigation on Deep Learning-Based Omnidirectional Image and Video Super-Resolution

Authors: Qianqian Zhao, Chunle Guo, Tianyi Zhang, Junpei Zhang, Peiyang Jia, Tan Su, Wenjie Jiang, Chongyi Li

Abstract: Omnidirectional image and video super-resolution is a crucial research topic in low-level vision, playing an essential role in virtual reality and augmented reality applications. Its goal is to reconstruct high-resolution images or video frames from low-resolution inputs, thereby enhancing detail preservation and enabling more accurate scene analysis and interpretation. In recent years, numerous i… ▽ More Omnidirectional image and video super-resolution is a crucial research topic in low-level vision, playing an essential role in virtual reality and augmented reality applications. Its goal is to reconstruct high-resolution images or video frames from low-resolution inputs, thereby enhancing detail preservation and enabling more accurate scene analysis and interpretation. In recent years, numerous innovative and effective approaches have been proposed, predominantly based on deep learning techniques, involving diverse network architectures, loss functions, projection strategies, and training datasets. This paper presents a systematic review of recent progress in omnidirectional image and video super-resolution, focusing on deep learning-based methods. Given that existing datasets predominantly rely on synthetic degradation and fall short in capturing real-world distortions, we introduce a new dataset, 360Insta, that comprises authentically degraded omnidirectional images and videos collected under diverse conditions, including varying lighting, motion, and exposure settings. This dataset addresses a critical gap in current omnidirectional benchmarks and enables more robust evaluation of the generalization capabilities of omnidirectional super-resolution methods. We conduct comprehensive qualitative and quantitative evaluations of existing methods on both public datasets and our proposed dataset. Furthermore, we provide a systematic overview of the current status of research and discuss promising directions for future exploration. All datasets, methods, and evaluation metrics introduced in this work are publicly available and will be regularly updated. Project page: https://github.com/nqian1/Survey-on-ODISR-and-ODVSR. △ Less

Submitted 7 June, 2025; originally announced June 2025.

arXiv:2505.23134 [pdf, ps, other]

Zero-to-Hero: Zero-Shot Initialization Empowering Reference-Based Video Appearance Editing

Authors: Tongtong Su, Chengyu Wang, Jun Huang, Dongming Lu

Abstract: Appearance editing according to user needs is a pivotal task in video editing. Existing text-guided methods often lead to ambiguities regarding user intentions and restrict fine-grained control over editing specific aspects of objects. To overcome these limitations, this paper introduces a novel approach named {Zero-to-Hero}, which focuses on reference-based video editing that disentangles the edi… ▽ More Appearance editing according to user needs is a pivotal task in video editing. Existing text-guided methods often lead to ambiguities regarding user intentions and restrict fine-grained control over editing specific aspects of objects. To overcome these limitations, this paper introduces a novel approach named {Zero-to-Hero}, which focuses on reference-based video editing that disentangles the editing process into two distinct problems. It achieves this by first editing an anchor frame to satisfy user requirements as a reference image and then consistently propagating its appearance across other frames. We leverage correspondence within the original frames to guide the attention mechanism, which is more robust than previously proposed optical flow or temporal modules in memory-friendly video generative models, especially when dealing with objects exhibiting large motions. It offers a solid ZERO-shot initialization that ensures both accuracy and temporal consistency. However, intervention in the attention mechanism results in compounded imaging degradation with over-saturated colors and unknown blurring issues. Starting from Zero-Stage, our Hero-Stage Holistically learns a conditional generative model for vidEo RestOration. To accurately evaluate the consistency of the appearance, we construct a set of videos with multiple appearances using Blender, enabling a fine-grained and deterministic evaluation. Our method outperforms the best-performing baseline with a PSNR improvement of 2.6 dB. The project page is at https://github.com/Tonniia/Zero2Hero. △ Less

Submitted 29 May, 2025; originally announced May 2025.

arXiv:2505.23014 [pdf, ps, other]

Hyperbolic-PDE GNN: Spectral Graph Neural Networks in the Perspective of A System of Hyperbolic Partial Differential Equations

Authors: Juwei Yue, Haikuo Li, Jiawei Sheng, Xiaodong Li, Taoyu Su, Tingwen Liu, Li Guo

Abstract: Graph neural networks (GNNs) leverage message passing mechanisms to learn the topological features of graph data. Traditional GNNs learns node features in a spatial domain unrelated to the topology, which can hardly ensure topological features. In this paper, we formulates message passing as a system of hyperbolic partial differential equations (hyperbolic PDEs), constituting a dynamical system th… ▽ More Graph neural networks (GNNs) leverage message passing mechanisms to learn the topological features of graph data. Traditional GNNs learns node features in a spatial domain unrelated to the topology, which can hardly ensure topological features. In this paper, we formulates message passing as a system of hyperbolic partial differential equations (hyperbolic PDEs), constituting a dynamical system that explicitly maps node representations into a particular solution space. This solution space is spanned by a set of eigenvectors describing the topological structure of graphs. Within this system, for any moment in time, a node features can be decomposed into a superposition of the basis of eigenvectors. This not only enhances the interpretability of message passing but also enables the explicit extraction of fundamental characteristics about the topological structure. Furthermore, by solving this system of hyperbolic partial differential equations, we establish a connection with spectral graph neural networks (spectral GNNs), serving as a message passing enhancement paradigm for spectral GNNs.We further introduce polynomials to approximate arbitrary filter functions. Extensive experiments demonstrate that the paradigm of hyperbolic PDEs not only exhibits strong flexibility but also significantly enhances the performance of various spectral GNNs across diverse graph tasks. △ Less

Submitted 28 May, 2025; originally announced May 2025.

Comments: 18 pages, 2 figures, published to ICML 2025

Journal ref: International Conference on Machine Learning 2025

arXiv:2505.15950 [pdf, ps, other]

Gaussian Processes in Power Systems: Techniques, Applications, and Future Works

Authors: Bendong Tan, Tong Su, Yu Weng, Ketian Ye, Parikshit Pareek, Petr Vorobev, Hung Nguyen, Junbo Zhao, Deepjyoti Deka

Abstract: The increasing integration of renewable energy sources (RESs) and distributed energy resources (DERs) has significantly heightened operational complexity and uncertainty in modern power systems. Concurrently, the widespread deployment of smart meters, phasor measurement units (PMUs) and other sensors has generated vast spatiotemporal data streams, enabling advanced data-driven analytics and decisi… ▽ More The increasing integration of renewable energy sources (RESs) and distributed energy resources (DERs) has significantly heightened operational complexity and uncertainty in modern power systems. Concurrently, the widespread deployment of smart meters, phasor measurement units (PMUs) and other sensors has generated vast spatiotemporal data streams, enabling advanced data-driven analytics and decision-making in grid operations. In this context, Gaussian processes (GPs) have emerged as a powerful probabilistic framework, offering uncertainty quantification, non-parametric modeling, and predictive capabilities to enhance power system analysis and control. This paper presents a comprehensive review of GP techniques and their applications in power system operation and control. GP applications are reviewed across three key domains: GP-based modeling, risk assessment, and optimization and control. These areas serve as representative examples of how GP can be utilized in power systems. Furthermore, critical challenges in GP applications are discussed, and potential research directions are outlined to facilitate future power system operations. △ Less

Submitted 22 May, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

arXiv:2505.14212 [pdf, ps, other]

Automatic Dataset Generation for Knowledge Intensive Question Answering Tasks

Authors: Sizhe Yuen, Ting Su, Ziyang Wang, Yali Du, Adam J. Sobey

Abstract: A question-answering (QA) system is to search suitable answers within a knowledge base. Current QA systems struggle with queries requiring complex reasoning or real-time knowledge integration. They are often supplemented with retrieval techniques on a data source such as Retrieval-Augmented Generation (RAG). However, RAG continues to face challenges in handling complex reasoning and logical connec… ▽ More A question-answering (QA) system is to search suitable answers within a knowledge base. Current QA systems struggle with queries requiring complex reasoning or real-time knowledge integration. They are often supplemented with retrieval techniques on a data source such as Retrieval-Augmented Generation (RAG). However, RAG continues to face challenges in handling complex reasoning and logical connections between multiple sources of information. A novel approach for enhancing Large Language Models (LLMs) in knowledge-intensive QA tasks is presented through the automated generation of context-based QA pairs. This methodology leverages LLMs to create fine-tuning data, reducing reliance on human labelling and improving model comprehension and reasoning capabilities. The proposed system includes an automated QA generator and a model fine-tuner, evaluated using perplexity, ROUGE, BLEU, and BERTScore. Comprehensive experiments demonstrate improvements in logical coherence and factual accuracy, with implications for developing adaptable Artificial Intelligence (AI) systems. Mistral-7b-v0.3 outperforms Llama-3-8b with BERT F1, BLEU, and ROUGE scores 0.858, 0.172, and 0.260 of for the LLM generated QA pairs compared to scores of 0.836, 0.083, and 0.139 for the human annotated QA pairs. △ Less

Submitted 20 May, 2025; originally announced May 2025.

arXiv:2505.11997 [pdf, ps, other]

Multimodal Cancer Survival Analysis via Hypergraph Learning with Cross-Modality Rebalance

Authors: Mingcheng Qu, Guang Yang, Donglin Di, Tonghua Su, Yue Gao, Yang Song, Lei Fan

Abstract: Multimodal pathology-genomic analysis has become increasingly prominent in cancer survival prediction. However, existing studies mainly utilize multi-instance learning to aggregate patch-level features, neglecting the information loss of contextual and hierarchical details within pathology images. Furthermore, the disparity in data granularity and dimensionality between pathology and genomics lead… ▽ More Multimodal pathology-genomic analysis has become increasingly prominent in cancer survival prediction. However, existing studies mainly utilize multi-instance learning to aggregate patch-level features, neglecting the information loss of contextual and hierarchical details within pathology images. Furthermore, the disparity in data granularity and dimensionality between pathology and genomics leads to a significant modality imbalance. The high spatial resolution inherent in pathology data renders it a dominant role while overshadowing genomics in multimodal integration. In this paper, we propose a multimodal survival prediction framework that incorporates hypergraph learning to effectively capture both contextual and hierarchical details from pathology images. Moreover, it employs a modality rebalance mechanism and an interactive alignment fusion strategy to dynamically reweight the contributions of the two modalities, thereby mitigating the pathology-genomics imbalance. Quantitative and qualitative experiments are conducted on five TCGA datasets, demonstrating that our model outperforms advanced methods by over 3.4\% in C-Index performance. △ Less

Submitted 20 May, 2025; v1 submitted 17 May, 2025; originally announced May 2025.

Comments: accepted by IJCAI2025 Code: https://github.com/MCPathology/MRePath

arXiv:2505.11010 [pdf, other]

Review-Instruct: A Review-Driven Multi-Turn Conversations Generation Method for Large Language Models

Authors: Jiangxu Wu, Cong Wang, TianHuang Su, Jun Yang, Haozhi Lin, Chao Zhang, Ming Peng, Kai Shi, SongPan Yang, BinQing Pan, ZiXian Li, Ni Yang, ZhenYu Yang

Abstract: The effectiveness of large language models (LLMs) in conversational AI is hindered by their reliance on single-turn supervised fine-tuning (SFT) data, which limits contextual coherence in multi-turn dialogues. Existing methods for generating multi-turn dialogue data struggle to ensure both diversity and quality in instructions. To address this, we propose Review-Instruct, a novel framework that sy… ▽ More The effectiveness of large language models (LLMs) in conversational AI is hindered by their reliance on single-turn supervised fine-tuning (SFT) data, which limits contextual coherence in multi-turn dialogues. Existing methods for generating multi-turn dialogue data struggle to ensure both diversity and quality in instructions. To address this, we propose Review-Instruct, a novel framework that synthesizes multi-turn conversations through an iterative "Ask-Respond-Review" process involving three agent roles: a Candidate, multiple Reviewers, and a Chairman. The framework iteratively refines instructions by incorporating Reviewer feedback, enhancing dialogue diversity and difficulty. We construct a multi-turn dataset using the Alpaca dataset and fine-tune the LLaMA2-13B model. Evaluations on MT-Bench, MMLU-Pro, and Auto-Arena demonstrate significant improvements, achieving absolute gains of 2.9\% on MMLU-Pro and 2\% on MT-Bench compared to prior state-of-the-art models based on LLaMA2-13B. Ablation studies confirm the critical role of the Review stage and the use of multiple Reviewers in boosting instruction diversity and difficulty. Our work highlights the potential of review-driven, multi-agent frameworks for generating high-quality conversational data at scale. △ Less

Submitted 16 May, 2025; originally announced May 2025.

Comments: ACL2025 Accepted

arXiv:2505.08838 [pdf, ps, other]

Ultrasound Report Generation with Multimodal Large Language Models for Standardized Texts

Authors: Peixuan Ge, Tongkun Su, Faqin Lv, Baoliang Zhao, Peng Zhang, Chi Hong Wong, Liang Yao, Yu Sun, Zenan Wang, Pak Kin Wong, Ying Hu

Abstract: Ultrasound (US) report generation is a challenging task due to the variability of US images, operator dependence, and the need for standardized text. Unlike X-ray and CT, US imaging lacks consistent datasets, making automation difficult. In this study, we propose a unified framework for multi-organ and multilingual US report generation, integrating fragment-based multilingual training and leveragi… ▽ More Ultrasound (US) report generation is a challenging task due to the variability of US images, operator dependence, and the need for standardized text. Unlike X-ray and CT, US imaging lacks consistent datasets, making automation difficult. In this study, we propose a unified framework for multi-organ and multilingual US report generation, integrating fragment-based multilingual training and leveraging the standardized nature of US reports. By aligning modular text fragments with diverse imaging data and curating a bilingual English-Chinese dataset, the method achieves consistent and clinically accurate text generation across organ sites and languages. Fine-tuning with selective unfreezing of the vision transformer (ViT) further improves text-image alignment. Compared to the previous state-of-the-art KMVE method, our approach achieves relative gains of about 2\% in BLEU scores, approximately 3\% in ROUGE-L, and about 15\% in CIDEr, while significantly reducing errors such as missing or incorrect content. By unifying multi-organ and multi-language report generation into a single, scalable framework, this work demonstrates strong potential for real-world clinical workflows. △ Less

Submitted 19 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

arXiv:2504.19458 [pdf, other]

doi 10.1145/3726302.3730037

Mitigating Modality Bias in Multi-modal Entity Alignment from a Causal Perspective

Authors: Taoyu Su, Jiawei Sheng, Duohe Ma, Xiaodong Li, Juwei Yue, Mengxiao Song, Yingkai Tang, Tingwen Liu

Abstract: Multi-Modal Entity Alignment (MMEA) aims to retrieve equivalent entities from different Multi-Modal Knowledge Graphs (MMKGs), a critical information retrieval task. Existing studies have explored various fusion paradigms and consistency constraints to improve the alignment of equivalent entities, while overlooking that the visual modality may not always contribute positively. Empirically, entities… ▽ More Multi-Modal Entity Alignment (MMEA) aims to retrieve equivalent entities from different Multi-Modal Knowledge Graphs (MMKGs), a critical information retrieval task. Existing studies have explored various fusion paradigms and consistency constraints to improve the alignment of equivalent entities, while overlooking that the visual modality may not always contribute positively. Empirically, entities with low-similarity images usually generate unsatisfactory performance, highlighting the limitation of overly relying on visual features. We believe the model can be biased toward the visual modality, leading to a shortcut image-matching task. To address this, we propose a counterfactual debiasing framework for MMEA, termed CDMEA, which investigates visual modality bias from a causal perspective. Our approach aims to leverage both visual and graph modalities to enhance MMEA while suppressing the direct causal effect of the visual modality on model predictions. By estimating the Total Effect (TE) of both modalities and excluding the Natural Direct Effect (NDE) of the visual modality, we ensure that the model predicts based on the Total Indirect Effect (TIE), effectively utilizing both modalities and reducing visual modality bias. Extensive experiments on 9 benchmark datasets show that CDMEA outperforms 14 state-of-the-art methods, especially in low-similarity, high-noise, and low-resource data scenarios. △ Less

Submitted 15 May, 2025; v1 submitted 27 April, 2025; originally announced April 2025.

Comments: Accepted by SIGIR 2025, 11 pages, 10 figures, 4 tables,

arXiv:2504.18594 [pdf, other]

A Simple DropConnect Approach to Transfer-based Targeted Attack

Authors: Tongrui Su, Qingbin Li, Shengyu Zhu, Wei Chen, Xueqi Cheng

Abstract: We study the problem of transfer-based black-box attack, where adversarial samples generated using a single surrogate model are directly applied to target models. Compared with untargeted attacks, existing methods still have lower Attack Success Rates (ASRs) in the targeted setting, i.e., the obtained adversarial examples often overfit the surrogate model but fail to mislead other models. In this… ▽ More We study the problem of transfer-based black-box attack, where adversarial samples generated using a single surrogate model are directly applied to target models. Compared with untargeted attacks, existing methods still have lower Attack Success Rates (ASRs) in the targeted setting, i.e., the obtained adversarial examples often overfit the surrogate model but fail to mislead other models. In this paper, we hypothesize that the pixels or features in these adversarial examples collaborate in a highly dependent manner to maximize the success of an adversarial attack on the surrogate model, which we refer to as perturbation co-adaptation. Then, we propose to Mitigate perturbation Co-adaptation by DropConnect (MCD) to enhance transferability, by creating diverse variants of surrogate model at each optimization iteration. We conduct extensive experiments across various CNN- and Transformer-based models to demonstrate the effectiveness of MCD. In the challenging scenario of transferring from a CNN-based model to Transformer-based models, MCD achieves 13% higher average ASRs compared with state-of-the-art baselines. MCD boosts the performance of self-ensemble methods by bringing in more diversification across the variants while reserving sufficient semantic information for each variant. In addition, MCD attains the highest performance gain when scaling the compute of crafting adversarial examples. △ Less

Submitted 24 April, 2025; originally announced April 2025.

arXiv:2504.12027 [pdf, other]

Understanding Attention Mechanism in Video Diffusion Models

Authors: Bingyan Liu, Chengyu Wang, Tongtong Su, Huan Ten, Jun Huang, Kailing Guo, Kui Jia

Abstract: Text-to-video (T2V) synthesis models, such as OpenAI's Sora, have garnered significant attention due to their ability to generate high-quality videos from a text prompt. In diffusion-based T2V models, the attention mechanism is a critical component. However, it remains unclear what intermediate features are learned and how attention blocks in T2V models affect various aspects of video synthesis, s… ▽ More Text-to-video (T2V) synthesis models, such as OpenAI's Sora, have garnered significant attention due to their ability to generate high-quality videos from a text prompt. In diffusion-based T2V models, the attention mechanism is a critical component. However, it remains unclear what intermediate features are learned and how attention blocks in T2V models affect various aspects of video synthesis, such as image quality and temporal consistency. In this paper, we conduct an in-depth perturbation analysis of the spatial and temporal attention blocks of T2V models using an information-theoretic approach. Our results indicate that temporal and spatial attention maps affect not only the timing and layout of the videos but also the complexity of spatiotemporal elements and the aesthetic quality of the synthesized videos. Notably, high-entropy attention maps are often key elements linked to superior video quality, whereas low-entropy attention maps are associated with the video's intra-frame structure. Based on our findings, we propose two novel methods to enhance video quality and enable text-guided video editing. These methods rely entirely on lightweight manipulation of the attention matrices in T2V models. The efficacy and effectiveness of our methods are further validated through experimental evaluation across multiple datasets. △ Less

Submitted 16 April, 2025; v1 submitted 16 April, 2025; originally announced April 2025.

arXiv:2504.10214 [pdf, other]

Balancing Stability and Plasticity in Pretrained Detector: A Dual-Path Framework for Incremental Object Detection

Authors: Songze Li, Qixing Xu, Tonghua Su, Xu-Yao Zhang, Zhongjie Wang

Abstract: The balance between stability and plasticity remains a fundamental challenge in pretrained model-based incremental object detection (PTMIOD). While existing PTMIOD methods demonstrate strong performance on in-domain tasks aligned with pretraining data, their plasticity to cross-domain scenarios remains underexplored. Through systematic component-wise analysis of pretrained detectors, we reveal a f… ▽ More The balance between stability and plasticity remains a fundamental challenge in pretrained model-based incremental object detection (PTMIOD). While existing PTMIOD methods demonstrate strong performance on in-domain tasks aligned with pretraining data, their plasticity to cross-domain scenarios remains underexplored. Through systematic component-wise analysis of pretrained detectors, we reveal a fundamental discrepancy: the localization modules demonstrate inherent cross-domain stability-preserving precise bounding box estimation across distribution shifts-while the classification components require enhanced plasticity to mitigate discriminability degradation in cross-domain scenarios. Motivated by these findings, we propose a dual-path framework built upon pretrained DETR-based detectors which decouples localization stability and classification plasticity: the localization path maintains stability to preserve pretrained localization knowledge, while the classification path facilitates plasticity via parameter-efficient fine-tuning and resists forgetting with pseudo-feature replay. Extensive evaluations on both in-domain (MS COCO and PASCAL VOC) and cross-domain (TT100K) benchmarks show state-of-the-art performance, demonstrating our method's ability to effectively balance stability and plasticity in PTMIOD, achieving robust cross-domain adaptation and strong retention of anti-forgetting capabilities. △ Less

Submitted 14 April, 2025; originally announced April 2025.

arXiv:2504.06521 [pdf, other]

DUKAE: DUal-level Knowledge Accumulation and Ensemble for Pre-Trained Model-Based Continual Learning

Authors: Songze Li, Tonghua Su, Xu-Yao Zhang, Qixing Xu, Zhongjie Wang

Abstract: Pre-trained model-based continual learning (PTMCL) has garnered growing attention, as it enables more rapid acquisition of new knowledge by leveraging the extensive foundational understanding inherent in pre-trained model (PTM). Most existing PTMCL methods use Parameter-Efficient Fine-Tuning (PEFT) to learn new knowledge while consolidating existing memory. However, they often face some challenges… ▽ More Pre-trained model-based continual learning (PTMCL) has garnered growing attention, as it enables more rapid acquisition of new knowledge by leveraging the extensive foundational understanding inherent in pre-trained model (PTM). Most existing PTMCL methods use Parameter-Efficient Fine-Tuning (PEFT) to learn new knowledge while consolidating existing memory. However, they often face some challenges. A major challenge lies in the misalignment of classification heads, as the classification head of each task is trained within a distinct feature space, leading to inconsistent decision boundaries across tasks and, consequently, increased forgetting. Another critical limitation stems from the restricted feature-level knowledge accumulation, with feature learning typically restricted to the initial task only, which constrains the model's representation capabilities. To address these issues, we propose a method named DUal-level Knowledge Accumulation and Ensemble (DUKAE) that leverages both feature-level and decision-level knowledge accumulation by aligning classification heads into a unified feature space through Gaussian distribution sampling and introducing an adaptive expertise ensemble to fuse knowledge across feature subspaces. Extensive experiments on CIFAR-100, ImageNet-R, CUB-200, and Cars-196 datasets demonstrate the superior performance of our approach. △ Less

Submitted 14 April, 2025; v1 submitted 8 April, 2025; originally announced April 2025.

arXiv:2504.06342 [pdf, other]

MACER3D -- an upgrade of MACER2D with enhanced subgrid models and gas physics -- and its application to simulating AGN feedback in a massive elliptical galaxy

Authors: Haoen Zhang, Haojie Xia, Suoqing Ji, Feng Yuan, Minhang Guo, Rui Zhang, Bocheng Zhu, Yihuan Di, Aoyun He, Tingfang Su, Yuxuan Zou

Abstract: We present MACER3D (Multiscale AGN-regulated Cosmic Ecosystem Resolver in 3D), a new suite of three-dimensional hydrodynamic simulations that study active galactic nuclei (AGN) feedback on galactic scales over Gyr duration, with major enhancement in subgrid models and gas physics over its predecessor -- MACER (Massive AGN Controlled Ellipticals Resolved) which is in two dimensions (hereafter MACER… ▽ More We present MACER3D (Multiscale AGN-regulated Cosmic Ecosystem Resolver in 3D), a new suite of three-dimensional hydrodynamic simulations that study active galactic nuclei (AGN) feedback on galactic scales over Gyr duration, with major enhancement in subgrid models and gas physics over its predecessor -- MACER (Massive AGN Controlled Ellipticals Resolved) which is in two dimensions (hereafter MACER2D). MACER3D resolves gas dynamics from within the Bondi radius ($\sim 25\,\mathrm{pc}$) to halo scales. Combined with black hole accretion theory, it enables an accurate calculation of AGN outputs and subsequently their large-scale feedback effects. We present results from simulating an isolated elliptical galaxy with different feedback configurations. In the fiducial model with both AGN and supernova (SN) feedback, the temporal evolution of AGN luminosity and star formation rate are strongly correlated, suggesting shared dependence on the availability of gas supply for SMBH accretion and star formation. AGN duty cycles of several percent with a single-cycle timescale of $\sim 10^2\,\mathrm{Myr}$ agree with observations, while models with only AGN or SN feedback fail to reproduce observed cycles. While all models maintain a quiescent galaxy state, fiducial AGN+SN feedback model results in higher star formation than no-SN feedback, suggesting SN feedback, when acting synergistically with AGN feedback, may positively impact star formation. Combined AGN and SN feedback enhances halo-scale metal enrichment compared to single-feedback models. The simulated X-ray properties match observations and predict transient cavities produced by cold-mode AGN winds from past burst events. The differences between the results obtained by MACER2D and MACER3D are also discussed. △ Less

Submitted 8 April, 2025; originally announced April 2025.

Comments: 28 pages, 10 figures, accepted for publication in ApJ

arXiv:2504.00521 [pdf, other]

Automated detection of atomicity violations in large-scale systems

Authors: Hang He, Yixing Luo, Chengcheng Wan, Ting Su, Haiying Sun, Geguang Pu

Abstract: Atomicity violations in interrupt-driven programs pose a significant threat to software safety in critical systems. These violations occur when the execution sequence of operations on shared resources is disrupted by asynchronous interrupts. Detecting atomicity violations is challenging due to the vast program state space, application-level code dependencies, and complex domain-specific knowledge.… ▽ More Atomicity violations in interrupt-driven programs pose a significant threat to software safety in critical systems. These violations occur when the execution sequence of operations on shared resources is disrupted by asynchronous interrupts. Detecting atomicity violations is challenging due to the vast program state space, application-level code dependencies, and complex domain-specific knowledge. We propose Clover, a hybrid framework that integrates static analysis with large language model (LLM) agents to detect atomicity violations in real-world programs. Clover first performs static analysis to extract critical code snippets and operation information. It then initiates a multi-agent process, where the expert agent leverages domain-specific knowledge to detect atomicity violations, which are subsequently validated by the judge agent. Evaluations on RaceBench 2.1, SV-COMP, and RWIP demonstrate that Clover achieves a precision/recall of 92.3%/86.6%, outperforming existing approaches by 27.4-118.2% on F1-score. △ Less

Submitted 1 April, 2025; originally announced April 2025.

arXiv:2504.00380 [pdf, other]

Hierarchical Flow Diffusion for Efficient Frame Interpolation

Authors: Yang Hai, Guo Wang, Tan Su, Wenjie Jiang, Yinlin Hu

Abstract: Most recent diffusion-based methods still show a large gap compared to non-diffusion methods for video frame interpolation, in both accuracy and efficiency. Most of them formulate the problem as a denoising procedure in latent space directly, which is less effective caused by the large latent space. We propose to model bilateral optical flow explicitly by hierarchical diffusion models, which has m… ▽ More Most recent diffusion-based methods still show a large gap compared to non-diffusion methods for video frame interpolation, in both accuracy and efficiency. Most of them formulate the problem as a denoising procedure in latent space directly, which is less effective caused by the large latent space. We propose to model bilateral optical flow explicitly by hierarchical diffusion models, which has much smaller search space in the denoising procedure. Based on the flow diffusion model, we then use a flow-guided images synthesizer to produce the final result. We train the flow diffusion model and the image synthesizer end to end. Our method achieves state of the art in accuracy, and 10+ times faster than other diffusion-based methods. The project page is at: https://hfd-interpolation.github.io. △ Less

Submitted 31 March, 2025; originally announced April 2025.

Comments: Accepted by CVPR 2025

arXiv:2503.17638 [pdf, other]

Collective Wisdom: Policy Averaging with an Application to the Newsvendor Problem

Authors: Xiangyu Cui, Nicholas G. Hall, Yun Shi, Tianyuan Su

Abstract: We propose a Policy Averaging Approach (PAA) that synthesizes the strengths of existing approaches to create more reliable, flexible and justifiable policies for stochastic optimization problems. An important component of the PAA is risk diversification to reduce the randomness of policies. A second component emulates model averaging from statistics. A third component involves using cross-validati… ▽ More We propose a Policy Averaging Approach (PAA) that synthesizes the strengths of existing approaches to create more reliable, flexible and justifiable policies for stochastic optimization problems. An important component of the PAA is risk diversification to reduce the randomness of policies. A second component emulates model averaging from statistics. A third component involves using cross-validation to diversify and optimize weights among candidate policies. We demonstrate the use of the PAA for the newsvendor problem. For that problem, model-based approaches typically use specific and potentially unreliable assumptions of either independently and identically distributed (i.i.d.) demand or feature-dependent demand with covariates or autoregressive functions. Data-driven approaches, including sample averaging and the use of functions of covariates to set order quantities, typically suffer from overfitting and provide limited insights to justify recommended policies. By integrating concepts from statistics and finance, the PAA avoids these problems. We show using theoretical analysis, a simulation study, and an empirical study, that the PAA outperforms all those earlier approaches. The demonstrated benefits of the PAA include reduced expected cost, more stable performance, and improved insights to justify recommendations. Extensions to consider tail risk and the use of stratified sampling are discussed. Beyond the newsvendor problem, the PAA is applicable to a wide variety of decision-making problems under uncertainty. △ Less

Submitted 21 March, 2025; originally announced March 2025.

arXiv:2503.16522 [pdf, other]

Adams Bashforth Moulton Solver for Inversion and Editing in Rectified Flow

Authors: Yongjia Ma, Donglin Di, Xuan Liu, Xiaokai Chen, Lei Fan, Wei Chen, Tonghua Su

Abstract: Rectified flow models have achieved remarkable performance in image and video generation tasks. However, existing numerical solvers face a trade-off between fast sampling and high-accuracy solutions, limiting their effectiveness in downstream applications such as reconstruction and editing. To address this challenge, we propose leveraging the Adams-Bashforth-Moulton (ABM) predictor-corrector metho… ▽ More Rectified flow models have achieved remarkable performance in image and video generation tasks. However, existing numerical solvers face a trade-off between fast sampling and high-accuracy solutions, limiting their effectiveness in downstream applications such as reconstruction and editing. To address this challenge, we propose leveraging the Adams-Bashforth-Moulton (ABM) predictor-corrector method to enhance the accuracy of ODE solving in rectified flow models. Specifically, we introduce ABM-Solver, which integrates a multi step predictor corrector approach to reduce local truncation errors and employs Adaptive Step Size Adjustment to improve sampling speed. Furthermore, to effectively preserve non edited regions while facilitating semantic modifications, we introduce a Mask Guided Feature Injection module. We estimate self-similarity to generate a spatial mask that differentiates preserved regions from those available for editing. Extensive experiments on multiple high-resolution image datasets validate that ABM-Solver significantly improves inversion precision and editing quality, outperforming existing solvers without requiring additional training or optimization. △ Less

Submitted 16 March, 2025; originally announced March 2025.

arXiv:2502.19008 [pdf, other]

Binary Neural Networks for Large Language Model: A Survey

Authors: Liangdong Liu, Zhitong Zheng, Cong Wang, Tianhuang Su, Zhenyu Yang

Abstract: Large language models (LLMs) have wide applications in the field of natural language processing(NLP), such as GPT-4 and Llama. However, with the exponential growth of model parameter sizes, LLMs bring significant resource overheads. Low-bit quantization, as a key technique, reduces memory usage and computational demands by decreasing the bit-width of model parameters, activations, and gradients. P… ▽ More Large language models (LLMs) have wide applications in the field of natural language processing(NLP), such as GPT-4 and Llama. However, with the exponential growth of model parameter sizes, LLMs bring significant resource overheads. Low-bit quantization, as a key technique, reduces memory usage and computational demands by decreasing the bit-width of model parameters, activations, and gradients. Previous quantization methods for LLMs have largely employed Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT). PTQ does not require any retraining of the original model, while QAT involves optimizing precision during training to achieve the best quantization parameters. The BitNet team proposed a radically different approach, where quantization is performed from the start of model training, utilizing low-precision binary weights during the training process. This approach has led to the emergence of many binary quantization techniques for large language models. This paper provides a comprehensive review of these binary quantization techniques. Specifically, we will introduce binary quantization techniques in deep neural networks and further explore their application to LLMs, reviewing their various contributions, implementations, and applications. △ Less

Submitted 26 February, 2025; originally announced February 2025.

Comments: 23 pages, 7 figures

arXiv:2502.02457 [pdf, other]

Orientation-aware interaction-based deep material network in polycrystalline materials modeling

Authors: Ting-Ju Wei, Tung-Huan Su, Chuin-Shan Chen

Abstract: Multiscale simulations are indispensable for connecting microstructural features to the macroscopic behavior of polycrystalline materials, but their high computational demands limit their practicality. Deep material networks (DMNs) have been proposed as efficient surrogate models, yet they fall short of capturing texture evolution. To address this limitation, we propose the orientation-aware inter… ▽ More Multiscale simulations are indispensable for connecting microstructural features to the macroscopic behavior of polycrystalline materials, but their high computational demands limit their practicality. Deep material networks (DMNs) have been proposed as efficient surrogate models, yet they fall short of capturing texture evolution. To address this limitation, we propose the orientation-aware interaction-based deep material network (ODMN), which incorporates an orientation-aware mechanism and an interaction mechanism grounded in the Hill-Mandel principle. The orientation-aware mechanism learns the crystallographic textures, while the interaction mechanism captures stress-equilibrium directions among representative volume element (RVE) subregions, offering insight into internal microstructural mechanics. Notably, ODMN requires only linear elastic data for training yet generalizes effectively to complex nonlinear and anisotropic responses. Our results show that ODMN accurately predicts both mechanical responses and texture evolution under complex plastic deformation, thus expanding the applicability of DMNs to polycrystalline materials. By balancing computational efficiency with predictive fidelity, ODMN provides a robust framework for multiscale simulations of polycrystalline materials. △ Less

Submitted 4 February, 2025; originally announced February 2025.

arXiv:2501.14659 [pdf, other]

Towards Unified Structured Light Optimization

Authors: Tinglei Wan, Tonghua Su, Zhongjie Wang

Abstract: Structured light (SL) 3D reconstruction captures the precise surface shape of objects, providing high-accuracy 3D data essential for industrial inspection and robotic vision systems. However, current research on optimizing projection patterns in SL 3D reconstruction faces two main limitations: each scene requires separate training of calibration parameters, and optimization is restricted to specif… ▽ More Structured light (SL) 3D reconstruction captures the precise surface shape of objects, providing high-accuracy 3D data essential for industrial inspection and robotic vision systems. However, current research on optimizing projection patterns in SL 3D reconstruction faces two main limitations: each scene requires separate training of calibration parameters, and optimization is restricted to specific types of SL, which restricts their application range. To tackle these limitations, we present a unified framework for SL optimization, adaptable to diverse lighting conditions, object types, and different types of SL. Our framework quickly determines the optimal projection pattern using only a single projected image. Key contributions include a novel global matching method for projectors, enabling precise projector-camera alignment with just one projected image, and a new projection compensation model with a photometric adjustment module to reduce artifacts from out-of-gamut clipping. Experimental results show our method achieves superior decoding accuracy across various objects, SL patterns, and lighting conditions, significantly outperforming previous methods. △ Less

Submitted 24 January, 2025; originally announced January 2025.

arXiv:2501.10793 [pdf, other]

Modeling the Spectral Energy Distribution of Active Galactic Nuclei: Implications for Cosmological Simulations of Galaxy Formation

Authors: Tong Su, Qi Guo, Erlin Qiao, Wenxiang Pei, Luis C. Ho, Cedric G. Lacey

Abstract: Modeling the spectral energy distribution (SED) of active galactic nuclei (AGN) plays a very important role in constraining modern cosmological simulations of galaxy formation. Here, we utilize an advanced supermassive black hole (SMBH) accretion disk model to compute the accretion flow structure and AGN SED across a wide range of black hole mass ($M_{\rm SMBH}$) and dimensionless accretion rates… ▽ More Modeling the spectral energy distribution (SED) of active galactic nuclei (AGN) plays a very important role in constraining modern cosmological simulations of galaxy formation. Here, we utilize an advanced supermassive black hole (SMBH) accretion disk model to compute the accretion flow structure and AGN SED across a wide range of black hole mass ($M_{\rm SMBH}$) and dimensionless accretion rates $\dot{m}(\equiv \dot{M}_{\rm acc}/\dot{M}_\mathrm{Edd})$, where $\dot{M}_{\rm acc}$ is the mass flow rate through the disk and $\dot{M}_\mathrm{Edd}$ is the Eddington mass accretion rate. We find that the radiative efficiency is mainly influenced by $\dot m$, while contributions of $M_{\rm SMBH}$ and $\dot{m}$ to the bolometric luminosity are comparably important. We have developed new scaling relationships that relate the bolometric luminosity of an AGN to its luminosities in the hard X-ray, soft X-ray, and optical bands. Our results align with existing literature at high luminosities but suggest lower luminosities in the hard and soft X-ray bands for AGNs with low bolometric luminosities than commonly reported values. Combining with the semi-analytical model of galaxy formation \textsc{L-Galaxies} and Millennium dark matter simulation for the distribution of ($M_{\rm SMBH}, \dot{m}$) at different redshift, we find the model predictions align well with observational data at redshifts below 1 but deviates for higher redshifts regarding AGN detection fraction and luminosity functions. This deviation may arise from improper treatment of SMBH growth at high redshifts in the model or bias from limited observational data. This AGN SED calculation can be readily applied in other cosmological simulations. △ Less

Submitted 18 January, 2025; originally announced January 2025.

Comments: Submitted to ApJ, comments are welcomed

arXiv:2501.03582 [pdf, other]

Exact Decoding of Repetition Code under Circuit Level Noise

Authors: Hanyan Cao, Shoukuan Zhao, Dongyang Feng, Zisong Shen, Haisheng Yan, Tang Su, Weijie Sun, Huikai Xu, Feng Pan, Haifeng Yu, Pan Zhang

Abstract: Repetition code forms a fundamental basis for quantum error correction experiments. To date, it stands as the sole code that has achieved large distances and extremely low error rates. Its applications span the spectrum of evaluating hardware limitations, pinpointing hardware defects, and detecting rare events. However, current methods for decoding repetition codes under circuit level noise are su… ▽ More Repetition code forms a fundamental basis for quantum error correction experiments. To date, it stands as the sole code that has achieved large distances and extremely low error rates. Its applications span the spectrum of evaluating hardware limitations, pinpointing hardware defects, and detecting rare events. However, current methods for decoding repetition codes under circuit level noise are suboptimal, leading to inaccurate error correction thresholds and introducing additional errors in event detection. In this work, we establish that repetition code under circuit level noise has an exact solution, and we propose an optimal maximum likelihood decoding algorithm called planar. The algorithm is based on the exact solution of the spin glass partition function on planar graphs and has polynomial computational complexity. Through extensive numerical experiments, we demonstrate that our algorithm uncovers the exact threshold for depolarizing noise and realistic superconductor SI1000 noise. Furthermore, we apply our method to analyze data from recent quantum memory experiments conducted by Google Quantum AI, revealing that part of the error floor was attributed to the decoding algorithm used by Google. Finally, we implemented the repetition code quantum memory on superconducting systems with a 72-qubit quantum chip lacking reset gates, demonstrating that even with an unknown error model, the proposed algorithm achieves a significantly lower logical error rate than the matching-based algorithm. △ Less

Submitted 7 January, 2025; originally announced January 2025.

arXiv:2501.02863 [pdf, other]

Beyond Pass or Fail: Multi-Dimensional Benchmarking of Foundation Models for Goal-based Mobile UI Navigation

Authors: Dezhi Ran, Mengzhou Wu, Hao Yu, Yuetong Li, Jun Ren, Yuan Cao, Xia Zeng, Haochuan Lu, Zexin Xu, Mengqian Xu, Ting Su, Liangchao Yao, Ting Xiong, Wei Yang, Yuetang Deng, Assaf Marron, David Harel, Tao Xie

Abstract: Recent advances of foundation models (FMs) have made navigating mobile applications (apps) based on high-level goal instructions within reach, with significant industrial applications such as UI testing. While existing benchmarks evaluate FM-based UI navigation using the binary pass/fail metric, they have two major limitations: they cannot reflect the complex nature of mobile UI navigation where F… ▽ More Recent advances of foundation models (FMs) have made navigating mobile applications (apps) based on high-level goal instructions within reach, with significant industrial applications such as UI testing. While existing benchmarks evaluate FM-based UI navigation using the binary pass/fail metric, they have two major limitations: they cannot reflect the complex nature of mobile UI navigation where FMs may fail for various reasons (e.g., misunderstanding instructions and failed planning), and they lack industrial relevance due to oversimplified tasks that poorly represent real-world scenarios. To address the preceding limitations, we propose Sphinx, a comprehensive benchmark for multi-dimensional evaluation of FMs in industrial settings of UI navigation. Sphinx introduces a specialized toolkit that evaluates five essential FM capabilities, providing detailed insights into failure modes such as insufficient app knowledge or planning issues. Using both popular Google Play applications and WeChat's internal UI test cases, we evaluate 8 FMs with 20 different configurations. Our results show that existing FMs universally struggle with goal-based testing tasks, primarily due to insufficient UI-specific capabilities. We summarize seven lessons learned from benchmarking FMs with Sphinx, providing clear directions for improving FM-based mobile UI navigation. △ Less

Submitted 11 February, 2025; v1 submitted 6 January, 2025; originally announced January 2025.

arXiv:2501.00873 [pdf, other]

Exploring Structured Semantic Priors Underlying Diffusion Score for Test-time Adaptation

Authors: Mingjia Li, Shuang Li, Tongrui Su, Longhui Yuan, Jian Liang, Wei Li

Abstract: Capitalizing on the complementary advantages of generative and discriminative models has always been a compelling vision in machine learning, backed by a growing body of research. This work discloses the hidden semantic structure within score-based generative models, unveiling their potential as effective discriminative priors. Inspired by our theoretical findings, we propose DUSA to exploit the s… ▽ More Capitalizing on the complementary advantages of generative and discriminative models has always been a compelling vision in machine learning, backed by a growing body of research. This work discloses the hidden semantic structure within score-based generative models, unveiling their potential as effective discriminative priors. Inspired by our theoretical findings, we propose DUSA to exploit the structured semantic priors underlying diffusion score to facilitate the test-time adaptation of image classifiers or dense predictors. Notably, DUSA extracts knowledge from a single timestep of denoising diffusion, lifting the curse of Monte Carlo-based likelihood estimation over timesteps. We demonstrate the efficacy of our DUSA in adapting a wide variety of competitive pre-trained discriminative models on diverse test-time scenarios. Additionally, a thorough ablation study is conducted to dissect the pivotal elements in DUSA. Code is publicly available at https://github.com/BIT-DA/DUSA. △ Less

Submitted 1 January, 2025; originally announced January 2025.

Comments: Accepted by NeurIPS 2024. Project page: https://kiwixr.github.io/projects/dusa

arXiv:2412.19522 [pdf, other]

Exploiting Domain-Specific Parallel Data on Multilingual Language Models for Low-resource Language Translation

Authors: Surangika Ranathungaa, Shravan Nayak, Shih-Ting Cindy Huang, Yanke Mao, Tong Su, Yun-Hsiang Ray Chan, Songchen Yuan, Anthony Rinaldi, Annie En-Shiun Lee

Abstract: Neural Machine Translation (NMT) systems built on multilingual sequence-to-sequence Language Models (msLMs) fail to deliver expected results when the amount of parallel data for a language, as well as the language's representation in the model are limited. This restricts the capabilities of domain-specific NMT systems for low-resource languages (LRLs). As a solution, parallel data from auxiliary d… ▽ More Neural Machine Translation (NMT) systems built on multilingual sequence-to-sequence Language Models (msLMs) fail to deliver expected results when the amount of parallel data for a language, as well as the language's representation in the model are limited. This restricts the capabilities of domain-specific NMT systems for low-resource languages (LRLs). As a solution, parallel data from auxiliary domains can be used either to fine-tune or to further pre-train the msLM. We present an evaluation of the effectiveness of these two techniques in the context of domain-specific LRL-NMT. We also explore the impact of domain divergence on NMT model performance. We recommend several strategies for utilizing auxiliary parallel data in building domain-specific NMT models for LRLs. △ Less

Submitted 27 December, 2024; originally announced December 2024.

arXiv:2412.18208 [pdf, other]

doi 10.1103/5lfr-xb8m

Quantum framework for Reinforcement Learning: Integrating Markov decision process, quantum arithmetic, and trajectory search

Authors: Thet Htar Su, Shaswot Shresthamali, Masaaki Kondo

Abstract: This paper introduces a quantum framework for addressing reinforcement learning (RL) tasks, grounded in the quantum principles and leveraging a fully quantum model of the classical Markov decision process (MDP). By employing quantum concepts and a quantum search algorithm, this work presents the implementation and optimization of the agent-environment interactions entirely within the quantum domai… ▽ More This paper introduces a quantum framework for addressing reinforcement learning (RL) tasks, grounded in the quantum principles and leveraging a fully quantum model of the classical Markov decision process (MDP). By employing quantum concepts and a quantum search algorithm, this work presents the implementation and optimization of the agent-environment interactions entirely within the quantum domain, eliminating reliance on classical computations. Key contributions include the quantum-based state transitions, return calculation, and trajectory search mechanism that utilize quantum principles to demonstrate the realization of RL processes through quantum phenomena. The implementation emphasizes the fundamental role of quantum superposition in enhancing computational efficiency for RL tasks. Results demonstrate the capacity of a quantum model to achieve quantum enhancement in RL, highlighting the potential of fully quantum implementations in decision-making tasks. This work not only underscores the applicability of quantum computing in machine learning but also contributes to the field of quantum reinforcement learning (QRL) by offering a robust framework for understanding and exploiting quantum computing in RL systems. △ Less

Submitted 28 May, 2025; v1 submitted 24 December, 2024; originally announced December 2024.

Journal ref: Physical Review A (2025)

arXiv:2412.12862 [pdf, other]

Scatter correction for photon-counting detector based CBCT imaging

Authors: Xin Zhang, Ting Su, Jiongtao Zhu, Hairong Zheng, Dong Liang, Yongshuai Ge

Abstract: Objective: The aim of this study is to validate the effectiveness of an energy-modulated scatter correction method in suppressing scatter in photon-counting detector (PCD)-based cone beam CT (CBCT) imaging. Approach: The scatter correction method, named e-Grid, which was initially applied to dual-layer flat-panel detector (DLFPD)-based CBCT imaging, was tested for its performance in PCD-CBCT imagi… ▽ More Objective: The aim of this study is to validate the effectiveness of an energy-modulated scatter correction method in suppressing scatter in photon-counting detector (PCD)-based cone beam CT (CBCT) imaging. Approach: The scatter correction method, named e-Grid, which was initially applied to dual-layer flat-panel detector (DLFPD)-based CBCT imaging, was tested for its performance in PCD-CBCT imaging. Benchtop PCD-CBCT imaging experiments were conducted to verify the effectiveness of the e-Grid method. Additionally, quantitative metrics were measured from these experimental results. Main results: It was found that the use of the e-Grid method could significantly eliminate cupping artifacts caused by Compton scatter in PCD-CBCT imaging. Meanwhile, its effectiveness was observed in both low- and high-energy images, as well as for objects of varying sizes. Quantitative results showed that the e-Grid method could reduce scatter artifacts by at least 71% in low-energy images and 75% in high-energy images. Significance: It was demonstrated that the scatter correction method originally applied to DLFPD-based CBCT could also perform well in PCD-CBCT, showing that the e-Grid method has great potential for application in other spectral CBCT imaging systems. △ Less

Submitted 17 December, 2024; originally announced December 2024.

arXiv:2412.08065 [pdf, other]

A Survey of Open-Source Power System Dynamic Simulators with Grid-Forming Inverter for Machine Learning Applications

Authors: Tong Su, Jiangkai Peng, Alaa Selim, Junbo Zhao, Jin Tan

Abstract: The emergence of grid-forming (GFM) inverter technology and the increasing role of machine learning in power systems highlight the need for evaluating the latest dynamic simulators. Open-source simulators offer distinct advantages in this field, being both free and highly customizable, which makes them well-suited for scientific research and validation of the latest models and methods. This paper… ▽ More The emergence of grid-forming (GFM) inverter technology and the increasing role of machine learning in power systems highlight the need for evaluating the latest dynamic simulators. Open-source simulators offer distinct advantages in this field, being both free and highly customizable, which makes them well-suited for scientific research and validation of the latest models and methods. This paper provides a comprehensive survey and comparison of the latest open-source simulators that support GFM, with a focus on their capabilities and performance in machine-learning applications. △ Less

Submitted 10 December, 2024; originally announced December 2024.

arXiv:2412.06521 [pdf]

Ancient DNA from 120-Million-Year-Old Lycoptera Fossils Reveals Evolutionary Insights

Authors: Wan-Qian Zhao, Zhan-Yong Guo, Zeng-Yuan Tian, Tong-Fu Su, Gang-Qiang Cao, Zi-Xin Qi, Tian-Cang Qin, Wei Zhou, Jin-Yu Yang, Ming-Jie Chen, Xin-Ge Zhang, Chun-Yan Zhou, Chuan-Jia Zhu, Meng-Fei Tang, Di Wu, Mei-Rong Song, Yu-Qi Guo, Li-You Qiu, Fei Liang, Mei-Jun Li, Jun-Hui Geng, Li-Juan Zhao, Shu-Jie Zhang

Abstract: High quality ancient DNA (aDNA) is essential for molecular paleontology. Due to DNA degradation and contamination by environmental DNA (eDNA), current research is limited to fossils less than 1 million years old. The study successfully extracted DNA from Lycoptera davidi fossils from the Early Cretaceous period, dating 120 million years ago. Using high-throughput sequencing, 1,258,901 DNA sequence… ▽ More High quality ancient DNA (aDNA) is essential for molecular paleontology. Due to DNA degradation and contamination by environmental DNA (eDNA), current research is limited to fossils less than 1 million years old. The study successfully extracted DNA from Lycoptera davidi fossils from the Early Cretaceous period, dating 120 million years ago. Using high-throughput sequencing, 1,258,901 DNA sequences were obtained. We established a rigorous protocol known as the mega screen method. Using this method, we identified 243 original in situ DNA (oriDNA) sequences, likely from the Lycoptera genome. These sequences have an average length of over 100 base pairs and show no signs of deamination. Additionally, 10 transposase coding sequences were discovered, shedding light on a unique self-renewal mechanism in the genome. This study provides valuable DNA data for understanding ancient fish evolution and advances paleontological research. △ Less

Submitted 9 December, 2024; originally announced December 2024.

Comments: 14 pages,3 Figures

arXiv:2412.04072 [pdf, other]

Boundary-Guided Learning for Gene Expression Prediction in Spatial Transcriptomics

Authors: Mingcheng Qu, Yuncong Wu, Donglin Di, Anyang Su, Tonghua Su, Yang Song, Lei Fan

Abstract: Spatial transcriptomics (ST) has emerged as an advanced technology that provides spatial context to gene expression. Recently, deep learning-based methods have shown the capability to predict gene expression from WSI data using ST data. Existing approaches typically extract features from images and the neighboring regions using pretrained models, and then develop methods to fuse this information t… ▽ More Spatial transcriptomics (ST) has emerged as an advanced technology that provides spatial context to gene expression. Recently, deep learning-based methods have shown the capability to predict gene expression from WSI data using ST data. Existing approaches typically extract features from images and the neighboring regions using pretrained models, and then develop methods to fuse this information to generate the final output. However, these methods often fail to account for the cellular structure similarity, cellular density and the interactions within the microenvironment. In this paper, we propose a framework named BG-TRIPLEX, which leverages boundary information extracted from pathological images as guiding features to enhance gene expression prediction from WSIs. Specifically, our model consists of three branches: the spot, in-context and global branches. In the spot and in-context branches, boundary information, including edge and nuclei characteristics, is extracted using pretrained models. These boundary features guide the learning of cellular morphology and the characteristics of microenvironment through Multi-Head Cross-Attention. Finally, these features are integrated with global features to predict the final output. Extensive experiments were conducted on three public ST datasets. The results demonstrate that our BG-TRIPLEX consistently outperforms existing methods in terms of Pearson Correlation Coefficient (PCC). This method highlights the crucial role of boundary features in understanding the complex interactions between WSI and gene expression, offering a promising direction for future research. △ Less

Submitted 8 December, 2024; v1 submitted 5 December, 2024; originally announced December 2024.

Comments: 8 pages, 5 figures

arXiv:2411.09577 [pdf, other]

SimTube: Generating Simulated Video Comments through Multimodal AI and User Personas

Authors: Yu-Kai Hung, Yun-Chien Huang, Ting-Yu Su, Yen-Ting Lin, Lung-Pan Cheng, Bryan Wang, Shao-Hua Sun

Abstract: Audience feedback is crucial for refining video content, yet it typically comes after publication, limiting creators' ability to make timely adjustments. To bridge this gap, we introduce SimTube, a generative AI system designed to simulate audience feedback in the form of video comments before a video's release. SimTube features a computational pipeline that integrates multimodal data from the vid… ▽ More Audience feedback is crucial for refining video content, yet it typically comes after publication, limiting creators' ability to make timely adjustments. To bridge this gap, we introduce SimTube, a generative AI system designed to simulate audience feedback in the form of video comments before a video's release. SimTube features a computational pipeline that integrates multimodal data from the video-such as visuals, audio, and metadata-with user personas derived from a broad and diverse corpus of audience demographics, generating varied and contextually relevant feedback. Furthermore, the system's UI allows creators to explore and customize the simulated comments. Through a comprehensive evaluation-comprising quantitative analysis, crowd-sourced assessments, and qualitative user studies-we show that SimTube's generated comments are not only relevant, believable, and diverse but often more detailed and informative than actual audience comments, highlighting its potential to help creators refine their content before release. △ Less

Submitted 17 November, 2024; v1 submitted 14 November, 2024; originally announced November 2024.

arXiv:2411.08329 [pdf, other]

doi 10.1109/TPWRS.2025.3577025

Neural Network Certification Informed Power System Transient Stability Preventive Control with Renewable Energy

Authors: Tong Su, Junbo Zhao

Abstract: Existing machine learning-based surrogate modeling methods for transient stability constrained-optimal power flow (TSC-OPF) lack certifications in the presence of unseen disturbances or uncertainties. This may lead to divergence of TSC-OPF or insecure control strategies. This paper proposes a neural network certification-informed power system transient stability preventive control method consideri… ▽ More Existing machine learning-based surrogate modeling methods for transient stability constrained-optimal power flow (TSC-OPF) lack certifications in the presence of unseen disturbances or uncertainties. This may lead to divergence of TSC-OPF or insecure control strategies. This paper proposes a neural network certification-informed power system transient stability preventive control method considering the impacts of various uncertainty resources, such as errors from measurements, fluctuations in renewable energy sources (RESs) and loads, etc. A deep belief network (DBN) is trained to estimate the transient stability, replacing the time-consuming time-domain simulation-based calculations. Then, DBN is embedded into the iterations of the primal-dual interior-point method to solve TSC-OPF. To guarantee the robustness of the solutions, the neural network verifier $α, β$-CROWN to deal with uncertainties from RESs and loads is proposed. The yielded certification results allow us to further adjust the transient stability safety margin under the iterated TSC-OPF solution process, balancing system security and economics. Numerical results on a modified western South Carolina 500-bus system demonstrate that the proposed method can efficiently and quickly obtain the safety-verified preventive control strategy through RES curtailment and generator dispatch with only a slight increase in cost. △ Less

Submitted 12 November, 2024; originally announced November 2024.

Journal ref: IEEE Transactions on Power Systems, 2025

arXiv:2410.19025 [pdf, other]

Large Language Models for Financial Aid in Financial Time-series Forecasting

Authors: Md Khairul Islam, Ayush Karmacharya, Timothy Sue, Judy Fox

Abstract: Considering the difficulty of financial time series forecasting in financial aid, much of the current research focuses on leveraging big data analytics in financial services. One modern approach is to utilize "predictive analysis", analogous to forecasting financial trends. However, many of these time series data in Financial Aid (FA) pose unique challenges due to limited historical datasets and h… ▽ More Considering the difficulty of financial time series forecasting in financial aid, much of the current research focuses on leveraging big data analytics in financial services. One modern approach is to utilize "predictive analysis", analogous to forecasting financial trends. However, many of these time series data in Financial Aid (FA) pose unique challenges due to limited historical datasets and high dimensional financial information, which hinder the development of effective predictive models that balance accuracy with efficient runtime and memory usage. Pre-trained foundation models are employed to address these challenging tasks. We use state-of-the-art time series models including pre-trained LLMs (GPT-2 as the backbone), transformers, and linear models to demonstrate their ability to outperform traditional approaches, even with minimal ("few-shot") or no fine-tuning ("zero-shot"). Our benchmark study, which includes financial aid with seven other time series tasks, shows the potential of using LLMs for scarce financial datasets. △ Less

Submitted 24 October, 2024; originally announced October 2024.

Comments: GitHub link https://github.com/UVA-MLSys/Financial-Time-Series

arXiv:2410.12099 [pdf, ps, other]

The EMC Effect of Tritium and Helium-3 from the JLab MARATHON Experiment

Authors: D. Abrams, H. Albataineh, B. S. Aljawrneh, S. Alsalmi, D. Androic, K. Aniol, W. Armstrong, J. Arrington, H. Atac, T. Averett, C. Ayerbe Gayoso, X. Bai, J. Bane, S. Barcus, A. Beck, V. Bellini, H. Bhatt, D. Bhetuwal, D. Biswas, D. Blyth, W. Boeglin, D. Bulumulla, J. Butler, A. Camsonne, M. Carmignotto , et al. (109 additional authors not shown)

Abstract: Measurements of the EMC effect in the tritium and helium-3 mirror nuclei are reported. The data were obtained by the MARATHON Jefferson Lab experiment, which performed deep inelastic electron scattering from deuterium and the three-body nuclei, using a cryogenic gas target system and the High Resolution Spectrometers of the Hall A Facility of the Lab. The data cover the Bjorken $x$ range from 0.20… ▽ More Measurements of the EMC effect in the tritium and helium-3 mirror nuclei are reported. The data were obtained by the MARATHON Jefferson Lab experiment, which performed deep inelastic electron scattering from deuterium and the three-body nuclei, using a cryogenic gas target system and the High Resolution Spectrometers of the Hall A Facility of the Lab. The data cover the Bjorken $x$ range from 0.20 to 0.83, corresponding to a squared four-momentum transfer $Q^2$ range from 2.7 to $11.9\gevsq$, and to an invariant mass $W$ of the final hadronic state greater than 1.84 GeV/${\it c}^2$. The tritium EMC effect measurement is the first of its kind. The MARATHON experimental results are compared to results from previous measurements by DESY-HERMES and JLab-Hall C experiments, as well as with few-body theoretical predictions. △ Less

Submitted 15 October, 2024; originally announced October 2024.

Comments: arXiv admin note: text overlap with arXiv:2104.05850

arXiv:2410.07151 [pdf, other]

FaceVid-1K: A Large-Scale High-Quality Multiracial Human Face Video Dataset

Authors: Donglin Di, He Feng, Wenzhang Sun, Yongjia Ma, Hao Li, Wei Chen, Xiaofei Gou, Tonghua Su, Xun Yang

Abstract: Generating talking face videos from various conditions has recently become a highly popular research area within generative tasks. However, building a high-quality face video generation model requires a well-performing pre-trained backbone, a key obstacle that universal models fail to adequately address. Most existing works rely on universal video or image generation models and optimize control me… ▽ More Generating talking face videos from various conditions has recently become a highly popular research area within generative tasks. However, building a high-quality face video generation model requires a well-performing pre-trained backbone, a key obstacle that universal models fail to adequately address. Most existing works rely on universal video or image generation models and optimize control mechanisms, but they neglect the evident upper bound in video quality due to the limited capabilities of the backbones, which is a result of the lack of high-quality human face video datasets. In this work, we investigate the unsatisfactory results from related studies, gather and trim existing public talking face video datasets, and additionally collect and annotate a large-scale dataset, resulting in a comprehensive, high-quality multiracial face collection named \textbf{FaceVid-1K}. Using this dataset, we craft several effective pre-trained backbone models for face video generation. Specifically, we conduct experiments with several well-established video generation models, including text-to-video, image-to-video, and unconditional video generation, under various settings. We obtain the corresponding performance benchmarks and compared them with those trained on public datasets to demonstrate the superiority of our dataset. These experiments also allow us to investigate empirical strategies for crafting domain-specific video generation tasks with cost-effective settings. We will make our curated dataset, along with the pre-trained talking face video generation models, publicly available as a resource contribution to hopefully advance the research field. △ Less

Submitted 23 September, 2024; originally announced October 2024.

arXiv:2409.07144 [pdf, other]

Dual channel CW nnU-Net for 3D PET-CT Lesion Segmentation in 2024 autoPET III Challenge

Authors: Ching-Wei Wang, Ting-Sheng Su, Keng-Wei Liu

Abstract: PET/CT is extensively used in imaging malignant tumors because it highlights areas of increased glucose metabolism, indicative of cancerous activity. Accurate 3D lesion segmentation in PET/CT imaging is essential for effective oncological diagnostics and treatment planning. In this study, we developed an advanced 3D residual U-Net model for the Automated Lesion Segmentation in Whole-Body PET/CT -… ▽ More PET/CT is extensively used in imaging malignant tumors because it highlights areas of increased glucose metabolism, indicative of cancerous activity. Accurate 3D lesion segmentation in PET/CT imaging is essential for effective oncological diagnostics and treatment planning. In this study, we developed an advanced 3D residual U-Net model for the Automated Lesion Segmentation in Whole-Body PET/CT - Multitracer Multicenter Generalization (autoPET III) Challenge, which will be held jointly with 2024 Medical Image Computing and Computer Assisted Intervention (MICCAI) conference at Marrakesh, Morocco. Proposed model incorporates a novel sample attention boosting technique to enhance segmentation performance by adjusting the contribution of challenging cases during training, improving generalization across FDG and PSMA tracers. The proposed model outperformed the challenge baseline model in the preliminary test set on the Grand Challenge platform, and our team is currently ranking in the 2nd place among 497 participants worldwide from 53 countries (accessed date: 2024/9/4), with Dice score of 0.8700, False Negative Volume of 19.3969 and False Positive Volume of 1.0857. △ Less

Submitted 11 September, 2024; originally announced September 2024.

arXiv:2409.07035

Approximately counting maximal independent set is equivalent to #SAT

Authors: Hao Zhang, Tonghua Su

Abstract: A maximal independent set is an independent set that is not a subset of any other independent set. It is also the key problem of mathematics, computer science, and other fields. A counting problem is a type of computational problem that associated with the number of solutions. Besides, counting problems help us better understand several fields such as algorithm analysis, complexity theory, artific… ▽ More A maximal independent set is an independent set that is not a subset of any other independent set. It is also the key problem of mathematics, computer science, and other fields. A counting problem is a type of computational problem that associated with the number of solutions. Besides, counting problems help us better understand several fields such as algorithm analysis, complexity theory, artificial intelligence, etc. The problem of counting maximal independent sets is #P-complete. So it is natural to think about approximate counting for maximal independent sets problem. In this article, we study the complexity of approximately counting maximal independent sets. Specifically, we are the first to prove that the #MIS problem is AP-interreducible with the #SAT of a given general graph. △ Less

Submitted 13 September, 2024; v1 submitted 11 September, 2024; originally announced September 2024.

Comments: After discussion, this is already known in JCSS (with the arXiv:1411.6829),proving that approximately counting MIS in bipartite graphs is equivalent to #SAT under AP-reductions, it is a stronger result if it restricts to bipartite graphs, which implies it for general graphs. Therefore, this paper tends to be more of a direct proof exercise

arXiv:2409.01994 [pdf, other]

BinPRE: Enhancing Field Inference in Binary Analysis Based Protocol Reverse Engineering

Authors: Jiayi Jiang, Xiyuan Zhang, Chengcheng Wan, Haoyi Chen, Haiying Sun, Ting Su

Abstract: Protocol reverse engineering (PRE) aims to infer the specification of network protocols when the source code is not available. Specifically, field inference is one crucial step in PRE to infer the field formats and semantics. To perform field inference, binary analysis based PRE techniques are one major approach category. However, such techniques face two key challenges - (1) the format inference… ▽ More Protocol reverse engineering (PRE) aims to infer the specification of network protocols when the source code is not available. Specifically, field inference is one crucial step in PRE to infer the field formats and semantics. To perform field inference, binary analysis based PRE techniques are one major approach category. However, such techniques face two key challenges - (1) the format inference is fragile when the logics of processing input messages may vary among different protocol implementations, and (2) the semantic inference is limited by inadequate and inaccurate inference rules. To tackle these challenges, we present BinPRE, a binary analysis based PRE tool. BinPRE incorporates (1) an instruction-based semantic similarity analysis strategy for format extraction; (2) a novel library composed of atomic semantic detectors for improving semantic inference adequacy; and (3) a cluster-and-refine paradigm to further improve semantic inference accuracy. We have evaluated BinPRE against five existing PRE tools, including Polyglot, AutoFormat, Tupni, BinaryInferno and DynPRE. The evaluation results on eight widely-used protocols show that BinPRE outperforms the prior PRE tools in both format and semantic inference. BinPRE achieves the perfection of 0.73 on format extraction and the F1-score of 0.74 (0.81) on semantic inference of types (functions), respectively. The field inference results of BinPRE have helped improve the effectiveness of protocol fuzzing by achieving 5-29% higher branch coverage, compared to those of the best prior PRE tool. BinPRE has also helped discover one new zero-day vulnerability, which otherwise cannot be found. △ Less

Submitted 3 September, 2024; originally announced September 2024.

Comments: Accepted by ACM Conference on Computer and Communications Security (CCS) 2024

arXiv:2408.17301 [pdf, ps, other]

Integral cohomology of dual boundary complexes is motivic

Authors: Tao Su

Abstract: In this note, we give a motivic characterization of the integral cohomology of dual boundary complexes of smooth quasi-projective complex algebraic varieties. As a corollary, the dual boundary complex of any stably affine space (of positive dimension) is contractible. In a separate paper [Su23], this corollary has been used by the author in his proof of the weak geometric P=W conjecture for very g… ▽ More In this note, we give a motivic characterization of the integral cohomology of dual boundary complexes of smooth quasi-projective complex algebraic varieties. As a corollary, the dual boundary complex of any stably affine space (of positive dimension) is contractible. In a separate paper [Su23], this corollary has been used by the author in his proof of the weak geometric P=W conjecture for very generic $GL_n(\mathbb{C})$-character varieties over any punctured Riemann surfaces. △ Less

Submitted 30 August, 2024; originally announced August 2024.

Comments: 8 pages; Following the anonymous referee's suggestion, the original paper arXiv:2307.16657 (v3) has been separated into two: v4 of that paper keeps the main result; this one deals with the motivic part

MSC Class: 14C15 (Primary) 14F45; 14C30 (Secondary)

arXiv:2408.13855 [pdf, other]

An Empirical Study of False Negatives and Positives of Static Code Analyzers From the Perspective of Historical Issues

Authors: Han Cui, Menglei Xie, Ting Su, Chengyu Zhang, Shin Hwei Tan

Abstract: Static code analyzers are widely used to help find program flaws. However, in practice the effectiveness and usability of such analyzers is affected by the problems of false negatives (FNs) and false positives (FPs). This paper aims to investigate the FNs and FPs of such analyzers from a new perspective, i.e., examining the historical issues of FNs and FPs of these analyzers reported by the mainta… ▽ More Static code analyzers are widely used to help find program flaws. However, in practice the effectiveness and usability of such analyzers is affected by the problems of false negatives (FNs) and false positives (FPs). This paper aims to investigate the FNs and FPs of such analyzers from a new perspective, i.e., examining the historical issues of FNs and FPs of these analyzers reported by the maintainers, users and researchers in their issue repositories -- each of these issues manifested as a FN or FP of these analyzers in the history and has already been confirmed and fixed by the analyzers' developers. To this end, we conduct the first systematic study on a broad range of 350 historical issues of FNs/FPs from three popular static code analyzers (i.e., PMD, SpotBugs, and SonarQube). All these issues have been confirmed and fixed by the developers. We investigated these issues' root causes and the characteristics of the corresponding issue-triggering programs. It reveals several new interesting findings and implications on mitigating FNs and FPs. Furthermore, guided by some findings of our study, we designed a metamorphic testing strategy to find FNs and FPs. This strategy successfully found 14 new issues of FNs/FPs, 11 of which have been confirmed and 9 have already been fixed by the developers. Our further manual investigation of the studied analyzers revealed one rule specification issue and additional four FNs/FPs due to the weaknesses of the implemented static analysis. We have made all the artifacts (datasets and tools) publicly available at https://zenodo.org/doi/10.5281/zenodo.11525129. △ Less

Submitted 25 August, 2024; originally announced August 2024.

arXiv:2408.04943 [pdf, other]

CBCT scatter correction with dual-layer flat-panel detector

Authors: Xin Zhang, Jixiong Xie, Ting Su, Jiongtao Zhu, Han Cui, Yuhang Tan, Dongmei Xia, Hairong Zheng, Dong Liang, Yongshuai Ge

Abstract: Background: Recently, the popularity of dual-layer flat-panel detector (DL-FPD) based dual-energy cone-beam CT (DE-CBCT) imaging has been increasing. However, the image quality of DE-CBCT remains constrained by the Compton scattered X-ray photons. Purpose: The objective of this study is to develop an energy-modulated scatter correction method for DL-FPD based CBCT imaging. Methods: In DL-FPD,… ▽ More Background: Recently, the popularity of dual-layer flat-panel detector (DL-FPD) based dual-energy cone-beam CT (DE-CBCT) imaging has been increasing. However, the image quality of DE-CBCT remains constrained by the Compton scattered X-ray photons. Purpose: The objective of this study is to develop an energy-modulated scatter correction method for DL-FPD based CBCT imaging. Methods: In DL-FPD, a certain portion of the X-ray photons (mainly low-energy primary and scattered photons) passing through the object are captured by the top detector layer, while the remaining X-ray photons (mainly high-energy primary and scattered photons) are collected by the bottom detector layer. Based on the two set of distinct low-energy and high-energy measurements, a linear signal model was approximated for the dual-energy primary and scattered signals on DL-FPD. The distributions of X-ray scatters were quickly estimated using this signal model. Monte Carlo (MC) simulation of a water phantom was conducted to verify this newly developed scatter estimation method. Moreover, physical experiments of water phantom, head phantom, and abdominal phantom were carried out to validate the real performance of this proposed scatter correction method. Results: The MC results showed that the e-Grid method was able to generate scatter distributions close to the ground truth. Moreover, the physical experiments demonstrated that the e-Grid method can greatly reduce the shading artifacts in both low-energy and high-energy CBCT images acquired from DL-FPD. On average, the image non-uniformity (NU) was reduced by over 77% in the low-energy CBCT image and by over 66% in the high-energy CBCT image. A a consequence, the accuracy of the decomposed multi-material bases was substantially improved. △ Less

Submitted 27 October, 2024; v1 submitted 9 August, 2024; originally announced August 2024.

arXiv:2407.21332 [pdf, ps, other]

doi 10.1103/PhysRevApplied.23.014012

Multi-Purpose Architecture for Fast Reset and Protective Readout of Superconducting Qubits

Authors: Jiayu Ding, Yulong Li, He Wang, Guangming Xue, Tang Su, Chenlu Wang, Weijie Sun, Feiyu Li, Yujia Zhang, Yang Gao, Jun Peng, Zhi Hao Jiang, Yang Yu, Haifeng Yu, Fei Yan

Abstract: The ability to fast reset a qubit state is crucial for quantum information processing. However, to actively reset a qubit requires engineering a pathway to interact with a dissipative bath, which often comes with the cost of reduced qubit protection from the environment. Here, we present a novel multi-purpose architecture that enables fast reset and protection of superconducting qubits during cont… ▽ More The ability to fast reset a qubit state is crucial for quantum information processing. However, to actively reset a qubit requires engineering a pathway to interact with a dissipative bath, which often comes with the cost of reduced qubit protection from the environment. Here, we present a novel multi-purpose architecture that enables fast reset and protection of superconducting qubits during control and readout. In our design, two on-chip diplexers are connected by two transmission lines. The high-pass branch provides a flat passband for convenient allocation of readout resonators above the qubit frequencies, which is preferred for reducing measurement-induced state transitions. In the low-pass branch, we leverage a standing-wave mode below the maximum qubit frequency for a rapid reset. The qubits are located in the common stopband to inhibit dissipation during coherent operations. We demonstrate resetting a transmon qubit from its first excited state to the ground state in 100 ns, achieving a residual population of 2.7%. The reset time may be further shortened to 27 ns by exploiting the coherent population inversion effect. We further extend the technique to resetting the qubit from its second excited state. Our approach promises scalable implementation of fast reset and qubit protection during control and readout, adding to the toolbox of dissipation engineering. △ Less

Submitted 8 January, 2025; v1 submitted 31 July, 2024; originally announced July 2024.

Comments: 8 pages, 4 figures

Journal ref: Phys. Rev. Applied 23, 014012 (2025)

arXiv:2407.20773 [pdf]

UpDown: Programmable fine-grained Events for Scalable Performance on Irregular Applications

Authors: Andronicus Rajasukumar, Jiya Su, Yuqing, Wang, Tianshuo Su, Marziyeh Nourian, Jose M Monsalve Diaz, Tianchi Zhang, Jianru Ding, Wenyi Wang, Ziyi Zhang, Moubarak Jeje, Henry Hoffmann, Yanjing Li, Andrew A. Chien

Abstract: Applications with irregular data structures, data-dependent control flows and fine-grained data transfers (e.g., real-world graph computations) perform poorly on cache-based systems. We propose the UpDown accelerator that supports fine-grained execution with novel architecture mechanisms - lightweight threading, event-driven scheduling, efficient ultra-short threads, and split-transaction DRAM acc… ▽ More Applications with irregular data structures, data-dependent control flows and fine-grained data transfers (e.g., real-world graph computations) perform poorly on cache-based systems. We propose the UpDown accelerator that supports fine-grained execution with novel architecture mechanisms - lightweight threading, event-driven scheduling, efficient ultra-short threads, and split-transaction DRAM access with software-controlled synchronization. These hardware primitives support software programmable events, enabling high performance on diverse data structures and algorithms. UpDown also supports scalable performance; hardware replication enables programs to scale up performance. Evaluation results show UpDown's flexibility and scalability enable it to outperform CPUs on graph mining and analytics computations by up to 116-195x geomean speedup and more than 4x speedup over prior accelerators. We show that UpDown generates high memory parallelism (~4.6x over CPU) required for memory intensive graph computations. We present measurements that attribute the performance of UpDown (23x architectural advantage) to its individual architectural mechanisms. Finally, we also analyze the area and power cost of UpDown's mechanisms for software programmability. △ Less

Submitted 30 July, 2024; originally announced July 2024.

Comments: 14 pages, 23 figures

arXiv:2407.19625 [pdf, other]

LoginMEA: Local-to-Global Interaction Network for Multi-modal Entity Alignment

Authors: Taoyu Su, Xinghua Zhang, Jiawei Sheng, Zhenyu Zhang, Tingwen Liu

Abstract: Multi-modal entity alignment (MMEA) aims to identify equivalent entities between two multi-modal knowledge graphs (MMKGs), whose entities can be associated with relational triples and related images. Most previous studies treat the graph structure as a special modality, and fuse different modality information with separate uni-modal encoders, neglecting valuable relational associations in modaliti… ▽ More Multi-modal entity alignment (MMEA) aims to identify equivalent entities between two multi-modal knowledge graphs (MMKGs), whose entities can be associated with relational triples and related images. Most previous studies treat the graph structure as a special modality, and fuse different modality information with separate uni-modal encoders, neglecting valuable relational associations in modalities. Other studies refine each uni-modal information with graph structures, but may introduce unnecessary relations in specific modalities. To this end, we propose a novel local-to-global interaction network for MMEA, termed as LoginMEA. Particularly, we first fuse local multi-modal interactions to generate holistic entity semantics and then refine them with global relational interactions of entity neighbors. In this design, the uni-modal information is fused adaptively, and can be refined with relations accordingly. To enrich local interactions of multi-modal entity information, we device modality weights and low-rank interactive fusion, allowing diverse impacts and element-level interactions among modalities. To capture global interactions of graph structures, we adopt relation reflection graph attention networks, which fully capture relational associations between entities. Extensive experiments demonstrate superior results of our method over 5 cross-KG or bilingual benchmark datasets, indicating the effectiveness of capturing local and global interactions. △ Less

Submitted 28 July, 2024; originally announced July 2024.

Comments: Accepted by ECAI 2024

arXiv:2407.19302 [pdf, other]

IBMEA: Exploring Variational Information Bottleneck for Multi-modal Entity Alignment

Authors: Taoyu Su, Jiawei Sheng, Shicheng Wang, Xinghua Zhang, Hongbo Xu, Tingwen Liu

Abstract: Multi-modal entity alignment (MMEA) aims to identify equivalent entities between multi-modal knowledge graphs (MMKGs), where the entities can be associated with related images. Most existing studies integrate multi-modal information heavily relying on the automatically-learned fusion module, rarely suppressing the redundant information for MMEA explicitly. To this end, we explore variational infor… ▽ More Multi-modal entity alignment (MMEA) aims to identify equivalent entities between multi-modal knowledge graphs (MMKGs), where the entities can be associated with related images. Most existing studies integrate multi-modal information heavily relying on the automatically-learned fusion module, rarely suppressing the redundant information for MMEA explicitly. To this end, we explore variational information bottleneck for multi-modal entity alignment (IBMEA), which emphasizes the alignment-relevant information and suppresses the alignment-irrelevant information in generating entity representations. Specifically, we devise multi-modal variational encoders to generate modal-specific entity representations as probability distributions. Then, we propose four modal-specific information bottleneck regularizers, limiting the misleading clues in refining modal-specific entity representations. Finally, we propose a modal-hybrid information contrastive regularizer to integrate all the refined modal-specific representations, enhancing the entity similarity between MMKGs to achieve MMEA. We conduct extensive experiments on two cross-KG and three bilingual MMEA datasets. Experimental results demonstrate that our model consistently outperforms previous state-of-the-art methods, and also shows promising and robust performance in low-resource and high-noise data scenarios. △ Less

Submitted 27 July, 2024; originally announced July 2024.

Comments: Accepted by ACM MM 2024

arXiv:2407.18955 [pdf, other]

Real Face Video Animation Platform

Authors: Xiaokai Chen, Xuan Liu, Donglin Di, Yongjia Ma, Wei Chen, Tonghua Su

Abstract: In recent years, facial video generation models have gained popularity. However, these models often lack expressive power when dealing with exaggerated anime-style faces due to the absence of high-quality anime-style face training sets. We propose a facial animation platform that enables real-time conversion from real human faces to cartoon-style faces, supporting multiple models. Built on the Gra… ▽ More In recent years, facial video generation models have gained popularity. However, these models often lack expressive power when dealing with exaggerated anime-style faces due to the absence of high-quality anime-style face training sets. We propose a facial animation platform that enables real-time conversion from real human faces to cartoon-style faces, supporting multiple models. Built on the Gradio framework, our platform ensures excellent interactivity and user-friendliness. Users can input a real face video or image and select their desired cartoon style. The system will then automatically analyze facial features, execute necessary preprocessing, and invoke appropriate models to generate expressive anime-style faces. We employ a variety of models within our system to process the HDTF dataset, thereby creating an animated facial video dataset. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.08949 [pdf, other]

One-Shot Pose-Driving Face Animation Platform

Authors: He Feng, Donglin Di, Yongjia Ma, Wei Chen, Tonghua Su

Abstract: The objective of face animation is to generate dynamic and expressive talking head videos from a single reference face, utilizing driving conditions derived from either video or audio inputs. Current approaches often require fine-tuning for specific identities and frequently fail to produce expressive videos due to the limited effectiveness of Wav2Pose modules. To facilitate the generation of one-… ▽ More The objective of face animation is to generate dynamic and expressive talking head videos from a single reference face, utilizing driving conditions derived from either video or audio inputs. Current approaches often require fine-tuning for specific identities and frequently fail to produce expressive videos due to the limited effectiveness of Wav2Pose modules. To facilitate the generation of one-shot and more consecutive talking head videos, we refine an existing Image2Video model by integrating a Face Locator and Motion Frame mechanism. We subsequently optimize the model using extensive human face video datasets, significantly enhancing its ability to produce high-quality and expressive talking head videos. Additionally, we develop a demo platform using the Gradio framework, which streamlines the process, enabling users to quickly create customized talking head videos. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Showing 1–50 of 256 results for author: Sue, T