Search | arXiv e-print repository

A Single Image Is All You Need: Zero-Shot Anomaly Localization Without Training Data

Authors: Mehrdad Moradi, Shengzhe Chen, Hao Yan, Kamran Paynabar

Abstract: Anomaly detection in images is typically addressed by learning from collections of training data or relying on reference samples. In many real-world scenarios, however, such training data may be unavailable, and only the test image itself is provided. We address this zero-shot setting by proposing a single-image anomaly localization method that leverages the inductive bias of convolutional neural… ▽ More Anomaly detection in images is typically addressed by learning from collections of training data or relying on reference samples. In many real-world scenarios, however, such training data may be unavailable, and only the test image itself is provided. We address this zero-shot setting by proposing a single-image anomaly localization method that leverages the inductive bias of convolutional neural networks, inspired by Deep Image Prior (DIP). Our method is named Single Shot Decomposition Network (SSDnet). Our key assumption is that natural images often exhibit unified textures and patterns, and that anomalies manifest as localized deviations from these repetitive or stochastic patterns. To learn the deep image prior, we design a patch-based training framework where the input image is fed directly into the network for self-reconstruction, rather than mapping random noise to the image as done in DIP. To avoid the model simply learning an identity mapping, we apply masking, patch shuffling, and small Gaussian noise. In addition, we use a perceptual loss based on inner-product similarity to capture structure beyond pixel fidelity. Our approach needs no external training data, labels, or references, and remains robust in the presence of noise or missing pixels. SSDnet achieves 0.99 AUROC and 0.60 AUPRC on MVTec-AD and 0.98 AUROC and 0.67 AUPRC on the fabric dataset, outperforming state-of-the-art methods. The implementation code will be released at https://github.com/mehrdadmoradi124/SSDnet △ Less

Submitted 22 September, 2025; originally announced September 2025.

Comments: 12 pages, 10 figures, 1 table. Preprint submitted to a CVF conference

MSC Class: 62H35; 68T07; 62M40; 68T45 ACM Class: I.2.6; I.2.10; I.4.6; I.4.8; I.5.1; I.5.4

arXiv:2509.12839 [pdf, ps, other]

Spatial Correlation and Degrees of Freedom in Arched HMIMO Arrays: A Closed-Form Analysis

Authors: Liuxun Xue, Shu Sun, Hangsong Yan

Abstract: This paper presents a closed-form analysis of spatial correlation and degrees of freedom (DoF) for arched holographic multiple-input multiple-output (HMIMO) arrays, which can be viewed as a special form of fluid antenna systems (FAS) when their geometry is fluidically adaptable. Unlike traditional planar configurations, practical HMIMO surfaces may exhibit curvature, significantly influencing thei… ▽ More This paper presents a closed-form analysis of spatial correlation and degrees of freedom (DoF) for arched holographic multiple-input multiple-output (HMIMO) arrays, which can be viewed as a special form of fluid antenna systems (FAS) when their geometry is fluidically adaptable. Unlike traditional planar configurations, practical HMIMO surfaces may exhibit curvature, significantly influencing their spatial characteristics and performance. We derive exact correlation expressions for both arched uniform linear arrays and arched uniform rectangular arrays, capturing curvature effects under far field propagation. Our results reveal that isotropic scattering results in DoF being dominated by the maximum span of the HMIMO array, such that shape effects are weakened, and bending does not significantly reduce the available spatial DoF. Numerical simulations validate the accuracy of the closed-form formulas and demonstrate the robustness of DoF against curvature variations, supporting flexible array designs. These findings offer fundamental insights into geometry-aware optimization for next-generation HMIMO/FAS systems and pave the way for practical implementations of curved HMIMO arrays. △ Less

Submitted 16 September, 2025; originally announced September 2025.

arXiv:2508.20479 [pdf, ps, other]

Joint Contact Planning for Navigation and Communication in GNSS-Libration Point Systems

Authors: Huan Yan, Juan A. Fraire, Ziqi Yang, Kanglian Zhao, Wenfeng Li, Xiyun Hou, Haohan Li, Yuxuan Miao, Jinjun Zheng, Chengbin Kang, Huichao Zhou, Xinuo Chang, Lu Wang, Linshan Xue

Abstract: Deploying satellites at Earth-Moon Libration Points (LPs) addresses the inherent deep-space coverage gaps of low-altitude GNSS constellations. Integrating LP satellites with GNSS into a joint constellation enables a more robust and comprehensive Positioning, Navigation, and Timing (PNT) system, while also extending navigation and communication services to spacecraft operating in cislunar space (i.… ▽ More Deploying satellites at Earth-Moon Libration Points (LPs) addresses the inherent deep-space coverage gaps of low-altitude GNSS constellations. Integrating LP satellites with GNSS into a joint constellation enables a more robust and comprehensive Positioning, Navigation, and Timing (PNT) system, while also extending navigation and communication services to spacecraft operating in cislunar space (i.e., users). However, the long propagation delays between LP satellites, users, and GNSS satellites result in significantly different link durations compared to those within the GNSS constellation. Scheduling inter-satellite links (ISLs) is a core task of Contact Plan Design (CPD). Existing CPD approaches focus exclusively on GNSS constellations, assuming uniform link durations, and thus cannot accommodate the heterogeneous link timescales present in a joint GNSS-LP system. To overcome this limitation, we introduce a Joint CPD (J-CPD) scheme tailored to handle ISLs with differing duration units across integrated constellations. The key contributions of J-CPD are: (i):introduction of LongSlots (Earth-Moon scale links) and ShortSlots (GNSS-scale links); (ii):a hierarchical and crossed CPD process for scheduling LongSlots and ShortSlots ISLs; (iii):an energy-driven link scheduling algorithm adapted to the CPD process. Simulations on a joint BeiDou-LP constellation demonstrate that J-CPD surpasses the baseline FCP method in both delay and ranging coverage, while maintaining high user satisfaction and enabling tunable trade-offs through adjustable potential-energy parameters. To our knowledge, this is the first CPD framework to jointly optimize navigation and communication in GNSS-LP systems, representing a key step toward unified and resilient deep-space PNT architectures. △ Less

Submitted 28 August, 2025; originally announced August 2025.

Comments: 15 pages, 8 figures

arXiv:2507.16190 [pdf, ps, other]

LABNet: A Lightweight Attentive Beamforming Network for Ad-hoc Multichannel Microphone Invariant Real-Time Speech Enhancement

Authors: Haoyin Yan, Jie Zhang, Chengqian Jiang, Shuang Zhang

Abstract: Multichannel speech enhancement (SE) aims to restore clean speech from noisy measurements by leveraging spatiotemporal signal features. In ad-hoc array conditions, microphone invariance (MI) requires systems to handle different microphone numbers and array geometries. From a practical perspective, multichannel recordings inevitably increase the computational burden for edge-device applications, hi… ▽ More Multichannel speech enhancement (SE) aims to restore clean speech from noisy measurements by leveraging spatiotemporal signal features. In ad-hoc array conditions, microphone invariance (MI) requires systems to handle different microphone numbers and array geometries. From a practical perspective, multichannel recordings inevitably increase the computational burden for edge-device applications, highlighting the necessity of lightweight and efficient deployments. In this work, we propose a lightweight attentive beamforming network (LABNet) to integrate MI in a low-complexity real-time SE system. We design a three-stage framework for efficient intra-channel modeling and inter-channel interaction. A cross-channel attention module is developed to aggregate features from each channel selectively. Experimental results demonstrate our LABNet achieves impressive performance with ultra-light resource overhead while maintaining the MI, indicating great potential for ad-hoc array processing. The code is available:https://github.com/Jokejiangv/LABNet.git △ Less

Submitted 26 August, 2025; v1 submitted 21 July, 2025; originally announced July 2025.

arXiv:2507.10086 [pdf, ps, other]

doi 10.1109/COMPEL57166.2025.11121219

Fast-Response Variable-Frequency Series-Capacitor Buck VRM Through Integrated Control Approaches

Authors: Guanyu Qian, Haoxian Yan, Xiaofan Cui

Abstract: Fast-response voltage regulation is essential for data-center Voltage Regulation Modules (VRMs) powering Artificial Intelligence (AI) workloads, which exhibit both small-amplitude fluctuations and abrupt full-load steps. This paper introduces a control scheme that integrates a linear controller and a nonlinear controller for variable-frequency Series-Capacitor Buck (SCB) converters. First, an accu… ▽ More Fast-response voltage regulation is essential for data-center Voltage Regulation Modules (VRMs) powering Artificial Intelligence (AI) workloads, which exhibit both small-amplitude fluctuations and abrupt full-load steps. This paper introduces a control scheme that integrates a linear controller and a nonlinear controller for variable-frequency Series-Capacitor Buck (SCB) converters. First, an accurate small-signal model is derived via a Switching-Synchronized Sampled State-Space (5S) framework, yielding discrete-time transfer functions and root-locus insights for direct digital design. A critical concern for SCB converters is series-capacitor oscillation during heavy load steps if the strict switching sequence is not maintained. To accelerate large-signal transients, a time-optimal control strategy based on Pontryagins Maximum Principle (PMP) relaxes the switching constraints to compute time-optimal switching sequences. A transition logic is then proposed to integrate the high-bandwidth small-signal controller and the large-signal controller. Simulations demonstrate a rapid output voltage recovery under a heavy load step-up, over ten times faster than a linear controller-only design. Preliminary hardware tests indicate a stable rejection to heavy load disturbances with zero steady-state error. △ Less

Submitted 14 July, 2025; originally announced July 2025.

Comments: 8 pages, 10 figures, IEEE conference style. to be published on IEEE Workshop on Control and Modeling of Power Electronics (COMPEL 2025)

arXiv:2507.10063 [pdf, ps, other]

Deep Learning-Based Beamforming Design Using Target Beam Patterns

Authors: Hongpu Zhang, Shu Sun, Hangsong Yan, Jianhua Mo

Abstract: This paper proposes a deep learning-based beamforming design framework that directly maps a target beam pattern to optimal beamforming vectors across multiple antenna array architectures, including digital, analog, and hybrid beamforming. The proposed method employs a lightweight encoder-decoder network where the encoder compresses the complex beam pattern into a low-dimensional feature vector and… ▽ More This paper proposes a deep learning-based beamforming design framework that directly maps a target beam pattern to optimal beamforming vectors across multiple antenna array architectures, including digital, analog, and hybrid beamforming. The proposed method employs a lightweight encoder-decoder network where the encoder compresses the complex beam pattern into a low-dimensional feature vector and the decoder reconstructs the beamforming vector while satisfying hardware constraints. To address training challenges under diverse and limited channel station information (CSI) conditions, a two-stage training process is introduced, which consists of an offline pre-training for robust feature extraction using an auxiliary module, followed by online training of the decoder with a composite loss function that ensures alignment between the synthesized and target beam patterns in terms of the main lobe shape and side lobe suppression. Simulation results based on NYUSIM-generated channels show that the proposed method can achieve spectral efficiency close to that of fully digital beamforming under limited CSI and outperforms representative existing methods. △ Less

Submitted 14 July, 2025; originally announced July 2025.

Comments: 6 pages, 5 figures

arXiv:2507.02445 [pdf, ps, other]

IGDNet: Zero-Shot Robust Underexposed Image Enhancement via Illumination-Guided and Denoising

Authors: Hailong Yan, Junjian Huang, Tingwen Huang

Abstract: Current methods for restoring underexposed images typically rely on supervised learning with paired underexposed and well-illuminated images. However, collecting such datasets is often impractical in real-world scenarios. Moreover, these methods can lead to over-enhancement, distorting well-illuminated regions. To address these issues, we propose IGDNet, a Zero-Shot enhancement method that operate… ▽ More Current methods for restoring underexposed images typically rely on supervised learning with paired underexposed and well-illuminated images. However, collecting such datasets is often impractical in real-world scenarios. Moreover, these methods can lead to over-enhancement, distorting well-illuminated regions. To address these issues, we propose IGDNet, a Zero-Shot enhancement method that operates solely on a single test image, without requiring guiding priors or training data. IGDNet exhibits strong generalization ability and effectively suppresses noise while restoring illumination. The framework comprises a decomposition module and a denoising module. The former separates the image into illumination and reflection components via a dense connection network, while the latter enhances non-uniformly illuminated regions using an illumination-guided pixel adaptive correction method. A noise pair is generated through downsampling and refined iteratively to produce the final result. Extensive experiments on four public datasets demonstrate that IGDNet significantly improves visual quality under complex lighting conditions. Quantitative results on metrics like PSNR (20.41dB) and SSIM (0.860dB) show that it outperforms 14 state-of-the-art unsupervised methods. The code will be released soon. △ Less

Submitted 3 July, 2025; originally announced July 2025.

Comments: Submitted to IEEE Transactions on Artificial Intelligence (TAI) on Oct.31, 2024

arXiv:2506.12537 [pdf, ps, other]

What Makes a Good Speech Tokenizer for LLM-Centric Speech Generation? A Systematic Study

Authors: Xiaoran Fan, Zhichao Sun, Yangfan Gao, Jingfei Xiong, Hang Yan, Yifei Cao, Jiajun Sun, Shuo Li, Zhihao Zhang, Zhiheng Xi, Yuhao Zhou, Senjie Jin, Changhao Jiang, Junjie Ye, Ming Zhang, Rui Zheng, Zhenhua Han, Yunke Zhang, Demei Yan, Shaokang Dong, Tao Ji, Tao Gui, Qi Zhang, Xuanjing Huang

Abstract: Speech-language models (SLMs) offer a promising path toward unifying speech and text understanding and generation. However, challenges remain in achieving effective cross-modal alignment and high-quality speech generation. In this work, we systematically investigate the role of speech tokenizer designs in LLM-centric SLMs, augmented by speech heads and speaker modeling. We compare coupled, semi-de… ▽ More Speech-language models (SLMs) offer a promising path toward unifying speech and text understanding and generation. However, challenges remain in achieving effective cross-modal alignment and high-quality speech generation. In this work, we systematically investigate the role of speech tokenizer designs in LLM-centric SLMs, augmented by speech heads and speaker modeling. We compare coupled, semi-decoupled, and fully decoupled speech tokenizers under a fair SLM framework and find that decoupled tokenization significantly improves alignment and synthesis quality. To address the information density mismatch between speech and text, we introduce multi-token prediction (MTP) into SLMs, enabling each hidden state to decode multiple speech tokens. This leads to up to 12$\times$ faster decoding and a substantial drop in word error rate (from 6.07 to 3.01). Furthermore, we propose a speaker-aware generation paradigm and introduce RoleTriviaQA, a large-scale role-playing knowledge QA benchmark with diverse speaker identities. Experiments demonstrate that our methods enhance both knowledge understanding and speaker consistency. △ Less

Submitted 5 August, 2025; v1 submitted 14 June, 2025; originally announced June 2025.

arXiv:2506.11616 [pdf, ps, other]

Wi-CBR: Salient-aware Adaptive WiFi Sensing for Cross-domain Behavior Recognition

Authors: Ruobei Zhang, Shengeng Tang, Huan Yan, Xiang Zhang, Jiabao Guo

Abstract: The challenge in WiFi-based cross-domain Behavior Recognition lies in the significant interference of domain-specific signals on gesture variation. However, previous methods alleviate this interference by mapping the phase from multiple domains into a common feature space. If the Doppler Frequency Shift (DFS) signal is used to dynamically supplement the phase features to achieve better generalizat… ▽ More The challenge in WiFi-based cross-domain Behavior Recognition lies in the significant interference of domain-specific signals on gesture variation. However, previous methods alleviate this interference by mapping the phase from multiple domains into a common feature space. If the Doppler Frequency Shift (DFS) signal is used to dynamically supplement the phase features to achieve better generalization, enabling model to not only explore a wider feature space but also avoid potential degradation of gesture semantic information. Specifically, we propose a novel Salient-aware Adaptive WiFi Sensing for Cross-domain Behavior Recognition (Wi-CBR}, which constructs a dual-branch self-attention module that captures temporal features from phase information reflecting dynamic path length variations, while extracting spatial features from DFS correlated with motion velocity. Moreover, we design a Saliency Guidance Module that employs group attention mechanisms to mine critical activity features, and utilizes gating mechanisms to optimize information entropy, facilitating feature fusion and enabling effective interaction between salient and non-salient behavior characteristics. Extensive experiments on two large-scale public datasets (Widar3.0 and XRF55) demonstrate the superior performance of our method in both in-domain and cross-domain scenarios. △ Less

Submitted 4 August, 2025; v1 submitted 13 June, 2025; originally announced June 2025.

arXiv:2506.09377 [pdf, ps, other]

An Interpretable Two-Stage Feature Decomposition Method for Deep Learning-based SAR ATR

Authors: Chenwei Wang, Renjie Xu, Congwen Wu, Cunyi Yin, Ziyun Liao, Deqing Mao, Sitong Zhang, Hong Yan

Abstract: Synthetic aperture radar automatic target recognition (SAR ATR) has seen significant performance improvements with deep learning. However, the black-box nature of deep SAR ATR introduces low confidence and high risks in decision-critical SAR applications, hindering practical deployment. To address this issue, deep SAR ATR should provide an interpretable reasoning basis $r_b$ and logic $λ_w$, formi… ▽ More Synthetic aperture radar automatic target recognition (SAR ATR) has seen significant performance improvements with deep learning. However, the black-box nature of deep SAR ATR introduces low confidence and high risks in decision-critical SAR applications, hindering practical deployment. To address this issue, deep SAR ATR should provide an interpretable reasoning basis $r_b$ and logic $λ_w$, forming the reasoning logic $\sum_{i} {{r_b^i} \times {λ_w^i}} =pred$ behind the decisions. Therefore, this paper proposes a physics-based two-stage feature decomposition method for interpretable deep SAR ATR, which transforms uninterpretable deep features into attribute scattering center components (ASCC) with clear physical meanings. First, ASCCs are obtained through a clustering algorithm. To extract independent physical components from deep features, we propose a two-stage decomposition method. In the first stage, a feature decoupling and discrimination module separates deep features into approximate ASCCs with global discriminability. In the second stage, a multilayer orthogonal non-negative matrix tri-factorization (MLO-NMTF) further decomposes the ASCCs into independent components with distinct physical meanings. The MLO-NMTF elegantly aligns with the clustering algorithms to obtain ASCCs. Finally, this method ensures both an interpretable reasoning process and accurate recognition results. Extensive experiments on four benchmark datasets confirm its effectiveness, showcasing the method's interpretability, robust recognition performance, and strong generalization capability. △ Less

Submitted 11 June, 2025; originally announced June 2025.

arXiv:2505.24576 [pdf, ps, other]

A Composite Predictive-Generative Approach to Monaural Universal Speech Enhancement

Authors: Jie Zhang, Haoyin Yan, Xiaofei Li

Abstract: It is promising to design a single model that can suppress various distortions and improve speech quality, i.e., universal speech enhancement (USE). Compared to supervised learning-based predictive methods, diffusion-based generative models have shown greater potential due to the generative capacities from degraded speech with severely damaged information. However, artifacts may be introduced in h… ▽ More It is promising to design a single model that can suppress various distortions and improve speech quality, i.e., universal speech enhancement (USE). Compared to supervised learning-based predictive methods, diffusion-based generative models have shown greater potential due to the generative capacities from degraded speech with severely damaged information. However, artifacts may be introduced in highly adverse conditions, and diffusion models often suffer from a heavy computational burden due to many steps for inference. In order to jointly leverage the superiority of prediction and generation and overcome the respective defects, in this work we propose a universal speech enhancement model called PGUSE by combining predictive and generative modeling. Our model consists of two branches: the predictive branch directly predicts clean samples from degraded signals, while the generative branch optimizes the denoising objective of diffusion models. We utilize the output fusion and truncated diffusion scheme to effectively integrate predictive and generative modeling, where the former directly combines results from both branches and the latter modifies the reverse diffusion process with initial estimates from the predictive branch. Extensive experiments on several datasets verify the superiority of the proposed model over state-of-the-art baselines, demonstrating the complementarity and benefits of combining predictive and generative modeling. △ Less

Submitted 30 May, 2025; originally announced May 2025.

Comments: Accepted by IEEE Transactions on Audio, Speech and Language Processing

arXiv:2505.24151 [pdf]

Channel Knowledge Maps for 6G Wireless Networks: Construction, Applications, and Future Challenges

Authors: Xingchen Liu, Shu Sun, Meixia Tao, Aryan Kaushik, Hangsong Yan

Abstract: The advent of 6G wireless networks promises unprecedented connectivity, supporting ultra-high data rates, low latency, and massive device connectivity. However, these ambitious goals introduce significant challenges, particularly in channel estimation due to complex and dynamic propagation environments. This paper explores the concept of channel knowledge maps (CKMs) as a solution to these challen… ▽ More The advent of 6G wireless networks promises unprecedented connectivity, supporting ultra-high data rates, low latency, and massive device connectivity. However, these ambitious goals introduce significant challenges, particularly in channel estimation due to complex and dynamic propagation environments. This paper explores the concept of channel knowledge maps (CKMs) as a solution to these challenges. CKMs enable environment-aware communications by providing location-specific channel information, reducing reliance on real-time pilot measurements. We categorize CKM construction techniques into measurement-based, model-based, and hybrid methods, and examine their key applications in integrated sensing and communication systems, beamforming, trajectory optimization of unmanned aerial vehicles, base station placement, and resource allocation. Furthermore, we discuss open challenges and propose future research directions to enhance the robustness, accuracy, and scalability of CKM-based systems in the evolving 6G landscape. △ Less

Submitted 29 May, 2025; originally announced May 2025.

arXiv:2505.21216 [pdf, ps, other]

CiUAV: A Multi-Task 3D Indoor Localization System for UAVs based on Channel State Information

Authors: Cunyi Yin, Chenwei Wang, Jing Chen, Hao Jiang, Xiren Miao, Shaocong Zheng Zhenghua Chen Senior, Hong Yan

Abstract: Accurate indoor positioning for unmanned aerial vehicles (UAVs) is critical for logistics, surveillance, and emergency response applications, particularly in GPS-denied environments. Existing indoor localization methods, including optical tracking, ultra-wideband, and Bluetooth-based systems, face cost, accuracy, and robustness trade-offs, limiting their practicality for UAV navigation. This paper… ▽ More Accurate indoor positioning for unmanned aerial vehicles (UAVs) is critical for logistics, surveillance, and emergency response applications, particularly in GPS-denied environments. Existing indoor localization methods, including optical tracking, ultra-wideband, and Bluetooth-based systems, face cost, accuracy, and robustness trade-offs, limiting their practicality for UAV navigation. This paper proposes CiUAV, a novel 3D indoor localization system designed for UAVs, leveraging channel state information (CSI) obtained from low-cost ESP32 IoT-based sensors. The system incorporates a dynamic automatic gain control (AGC) compensation algorithm to mitigate noise and stabilize CSI signals, significantly enhancing the robustness of the measurement. Additionally, a multi-task 3D localization model, Sensor-in-Sample (SiS), is introduced to enhance system robustness by addressing challenges related to incomplete sensor data and limited training samples. SiS achieves this by joint training with varying sensor configurations and sample sizes, ensuring reliable performance even in resource-constrained scenarios. Experiment results demonstrate that CiUAV achieves a LMSE localization error of 0.2629 m in a 3D space, achieving good accuracy and robustness. The proposed system provides a cost-effective and scalable solution, demonstrating its usefulness for UAV applications in resource-constrained indoor environments. △ Less

Submitted 27 May, 2025; originally announced May 2025.

arXiv:2505.09521 [pdf, ps, other]

Spec2VolCAMU-Net: A Spectrogram-to-Volume Model for EEG-to-fMRI Reconstruction based on Multi-directional Time-Frequency Convolutional Attention Encoder and Vision-Mamba U-Net

Authors: Dongyi He, Shiyang Li, Bin Jiang, He Yan

Abstract: High-resolution functional magnetic resonance imaging (fMRI) is essential for mapping human brain activity; however, it remains costly and logistically challenging. If comparable volumes could be generated directly from widely available scalp electroencephalography (EEG), advanced neuroimaging would become significantly more accessible. Existing EEG-to-fMRI generators rely on plain Convolutional N… ▽ More High-resolution functional magnetic resonance imaging (fMRI) is essential for mapping human brain activity; however, it remains costly and logistically challenging. If comparable volumes could be generated directly from widely available scalp electroencephalography (EEG), advanced neuroimaging would become significantly more accessible. Existing EEG-to-fMRI generators rely on plain Convolutional Neural Networks (CNNs) that fail to capture cross-channel time-frequency cues or on heavy transformer/Generative Adversarial Network (GAN) decoders that strain memory and stability. To address these limitations, we propose Spec2VolCAMU-Net, a lightweight architecture featuring a Multi-directional Time-Frequency Convolutional Attention Encoder for rich feature extraction and a Vision-Mamba U-Net decoder that uses linear-time state-space blocks for efficient long-range spatial modelling. We frame the goal of this work as establishing a new state of the art in the spatial fidelity of single-volume reconstruction, a foundational prerequisite for the ultimate aim of generating temporally coherent fMRI time series. Trained end-to-end with a hybrid SSI-MSE loss, Spec2VolCAMU-Net achieves state-of-the-art fidelity on three public benchmarks, recording Structural Similarity Index (SSIM) of 0.693 on NODDI, 0.725 on Oddball and 0.788 on CN-EPFL, representing improvements of 14.5%, 14.9%, and 16.9% respectively over previous best SSIM scores. Furthermore, it achieves competitive Signal-to-Noise Ratio (PSNR) scores, particularly excelling on the CN-EPFL dataset with a 4.6% improvement over the previous best PSNR, thus striking a better balance in reconstruction quality. The proposed model is lightweight and efficient, making it suitable for real-time applications in clinical and research settings. The code is available at https://github.com/hdy6438/Spec2VolCAMU-Net. △ Less

Submitted 10 September, 2025; v1 submitted 14 May, 2025; originally announced May 2025.

arXiv:2505.08038 [pdf, other]

Statistical CSI-Based Distributed Precoding Design for OFDM-Cooperative Multi-Satellite Systems

Authors: Yafei Wang, Vu Nguyen Ha, Konstantinos Ntontin, Hong Yan, Wenjin Wang, Symeon Chatzinotas, Björn Ottersten

Abstract: This paper investigates the design of distributed precoding for multi-satellite massive MIMO transmissions. We first conduct a detailed analysis of the transceiver model, in which delay and Doppler precompensation is introduced to ensure coherent transmission. In this analysis, we examine the impact of precompensation errors on the transmission model, emphasize the near-independence of inter-satel… ▽ More This paper investigates the design of distributed precoding for multi-satellite massive MIMO transmissions. We first conduct a detailed analysis of the transceiver model, in which delay and Doppler precompensation is introduced to ensure coherent transmission. In this analysis, we examine the impact of precompensation errors on the transmission model, emphasize the near-independence of inter-satellite interference, and ultimately derive the received signal model. Based on such signal model, we formulate an approximate expected rate maximization problem that considers both statistical channel state information (sCSI) and compensation errors. Unlike conventional approaches that recast such problems as weighted minimum mean square error (WMMSE) minimization, we demonstrate that this transformation fails to maintain equivalence in the considered scenario. To address this, we introduce an equivalent covariance decomposition-based WMMSE (CDWMMSE) formulation derived based on channel covariance matrix decomposition. Taking advantage of the channel characteristics, we develop a low-complexity decomposition method and propose an optimization algorithm. To further reduce computational complexity, we introduce a model-driven scalable deep learning (DL) approach that leverages the equivariance of the mapping from sCSI to the unknown variables in the optimal closed-form solution, enhancing performance through novel dense Transformer network and scaling-invariant loss function design. Simulation results validate the effectiveness and robustness of the proposed method in some practical scenarios. We also demonstrate that the DL approach can adapt to dynamic settings with varying numbers of users and satellites. △ Less

Submitted 12 May, 2025; originally announced May 2025.

arXiv:2503.22867 [pdf, other]

Markov Potential Game Construction and Multi-Agent Reinforcement Learning with Applications to Autonomous Driving

Authors: Huiwen Yan, Mushuang Liu

Abstract: Markov games (MGs) serve as the mathematical foundation for multi-agent reinforcement learning (MARL), enabling self-interested agents to learn their optimal policies while interacting with others in a shared environment. However, due to the complexities of an MG problem, seeking (Markov perfect) Nash equilibrium (NE) is often very challenging for a general-sum MG. Markov potential games (MPGs), w… ▽ More Markov games (MGs) serve as the mathematical foundation for multi-agent reinforcement learning (MARL), enabling self-interested agents to learn their optimal policies while interacting with others in a shared environment. However, due to the complexities of an MG problem, seeking (Markov perfect) Nash equilibrium (NE) is often very challenging for a general-sum MG. Markov potential games (MPGs), which are a special class of MGs, have appealing properties such as guaranteed existence of pure NEs and guaranteed convergence of gradient play algorithms, thereby leading to desirable properties for many MARL algorithms in their NE-seeking processes. However, the question of how to construct MPGs has been open. This paper provides sufficient conditions on the reward design and on the Markov decision process (MDP), under which an MG is an MPG. Numerical results on autonomous driving applications are reported. △ Less

Submitted 28 March, 2025; originally announced March 2025.

arXiv:2503.18353 [pdf, other]

Contact Plan Design for Cross-Linked GNSSs: An ILP Approach for Extended Applications

Authors: Huan Yan, Juan A. Fraire, Ziqi Yang, Kanglian Zhao, Wenfeng Li, Xiyun Hou, Haohan Li, Yuxuan Miao, Jinjun Zheng, Chengbin Kang, Huichao Zhou, Xinuo Chang, Lu Wang

Abstract: Global Navigation Satellite Systems (GNSS) employ inter-satellite links (ISLs) to reduce dependency on ground stations, enabling precise ranging and communication across satellites. Beyond their traditional role, ISLs can support extended applications, including providing navigation and communication services to external entities. However, designing effective contact plan design (CPD) schemes for… ▽ More Global Navigation Satellite Systems (GNSS) employ inter-satellite links (ISLs) to reduce dependency on ground stations, enabling precise ranging and communication across satellites. Beyond their traditional role, ISLs can support extended applications, including providing navigation and communication services to external entities. However, designing effective contact plan design (CPD) schemes for these multifaceted ISLs, operating under a polling time-division duplex (PTDD) framework, remains a critical challenge. Existing CPD approaches focus solely on meeting GNSS satellites' internal ranging and communication demands, neglecting their extended applications. This paper introduces the first CPD scheme capable of supporting extended GNSS ISLs. By modeling GNSS requirements and designing a tailored service process, our approach ensures the allocation of essential resources for internal operations while accommodating external user demands. Based on the BeiDou constellation, simulation results demonstrate the proposed scheme's efficacy in maintaining core GNSS functionality while providing extended ISLs on a best-effort basis. Additionally, the results highlight the significant impact of GNSS ISLs in enhancing orbit determination and clock synchronization for the Earth-Moon libration point constellation, underscoring the importance of extended GNSS ISL applications. △ Less

Submitted 24 March, 2025; originally announced March 2025.

Comments: 18 pages, 13 figures

arXiv:2503.18340 [pdf, ps, other]

Optimized Contact Plan Design for Reflector and Phased Array Terminals in Cislunar Space Networks

Authors: Huan Yan, Juan A. Fraire, Ziqi Yang, Kanglian Zhao, Wenfeng Li, Jinjun Zheng, Chengbin Kang, Huichao Zhou, Xinuo Chang, Lu Wang, Linshan Xue

Abstract: Cislunar space is emerging as a critical domain for human exploration, requiring robust infrastructure to support spatial users-spacecraft with navigation and communication demands. Deploying satellites at Earth-Moon three-body orbits offers an effective solution to construct cislunar space infrastructure (CLSI). However, scheduling satellite links to serve users necessitates an appropriate contac… ▽ More Cislunar space is emerging as a critical domain for human exploration, requiring robust infrastructure to support spatial users-spacecraft with navigation and communication demands. Deploying satellites at Earth-Moon three-body orbits offers an effective solution to construct cislunar space infrastructure (CLSI). However, scheduling satellite links to serve users necessitates an appropriate contact plan design (CPD) scheme. Existing CPD schemes focus solely on inter-satellite link scheduling, overlooking their role in providing services to users. This paper introduces a CPD scheme that considers two classes of satellite transponders: Reflector Links (RL) for high-volume data transfers and Phased Array Links (PL) for fast switching and navigation services. Our approach supports both satellites and spatial users in cislunar space. Simulations validate the scheme, demonstrating effective support for user while meeting satellite ranging and communication requirements. These findings provide essential insights for developing future Cislunar Space Infrastructures. △ Less

Submitted 28 August, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

Comments: 12 pages, 9 figures

arXiv:2503.02261 [pdf, other]

Volume Tells: Dual Cycle-Consistent Diffusion for 3D Fluorescence Microscopy De-noising and Super-Resolution

Authors: Zelin Li, Chenwei Wang, Zhaoke Huang, Yiming MA, Cunmin Zhao, Zhongying Zhao, Hong Yan

Abstract: 3D fluorescence microscopy is essential for understanding fundamental life processes through long-term live-cell imaging. However, due to inherent issues in imaging principles, it faces significant challenges including spatially varying noise and anisotropic resolution, where the axial resolution lags behind the lateral resolution up to 4.5 times. Meanwhile, laser power is kept low to maintain cel… ▽ More 3D fluorescence microscopy is essential for understanding fundamental life processes through long-term live-cell imaging. However, due to inherent issues in imaging principles, it faces significant challenges including spatially varying noise and anisotropic resolution, where the axial resolution lags behind the lateral resolution up to 4.5 times. Meanwhile, laser power is kept low to maintain cell viability, leading to inaccessible low-noise and high-resolution paired ground truth (GT). To tackle these limitations, a dual Cycle-consistent Diffusion is proposed to effectively mine intra-volume imaging priors within 3D cell volumes in an unsupervised manner, i.e., Volume Tells (VTCD), achieving de-noising and super-resolution (SR) simultaneously. Specifically, a spatially iso-distributed denoiser is designed to exploit the noise distribution consistency between adjacent low-noise and high-noise regions within the 3D cell volume, suppressing the spatially varying noise. Then, in light of the structural consistency of the cell volume, a cross-plane global-propagation SR module propagates high-resolution details from the XY plane into adjacent regions in the XZ and YZ planes, progressively enhancing resolution across the entire 3D cell volume. Experimental results on 10 in vivo cellular dataset demonstrate high improvements in both denoising and super-resolution, with axial resolution enhanced from ~ 430 nm to ~ 90 nm. △ Less

Submitted 3 March, 2025; originally announced March 2025.

Comments: Accepted on CVPR 2025

arXiv:2501.13006 [pdf, other]

Terahertz Integrated Sensing Communications and Powering for 6G Wireless Networks

Authors: Hua Yan, Yunfei Chen

Abstract: The terahertz (THz) band has attracted significant interest for future wireless networks. In this paper, a THz integrated sensing communications and powering (THz-ISCAP) system, where sensing is leveraged to enhance communications and powering, is studied. For a given total amount of time, we aim to determine an optimal time allocation for sensing to improve the efficiency of communications and po… ▽ More The terahertz (THz) band has attracted significant interest for future wireless networks. In this paper, a THz integrated sensing communications and powering (THz-ISCAP) system, where sensing is leveraged to enhance communications and powering, is studied. For a given total amount of time, we aim to determine an optimal time allocation for sensing to improve the efficiency of communications and powering, along with an optimal power splitting ratio to balance these two functionalities. This is achieved by maximizing between communications and powering either the achievable rate or harvested energy while ensuring a minimum requirement on the other. Numerical results indicate that the optimal system performance can be achieved by jointly optimizing the sensing time allocation and the power splitting ratio. Additionally, the results reveal the effects of various factors, such as THz frequencies and antenna aperture sizes, on the system performance. This study provides some interesting results to offer a new perspective for the research on THz-ISCAP. △ Less

Submitted 22 January, 2025; originally announced January 2025.

Comments: 12 pages

arXiv:2501.06098 [pdf, ps, other]

doi 10.1145/3746027.3754825

ELFATT: Efficient Linear Fast Attention for Vision Transformers

Authors: Chong Wu, Maolin Che, Renjie Xu, Zhuoheng Ran, Hong Yan

Abstract: The attention mechanism is the key to the success of transformers in different machine learning tasks. However, the quadratic complexity with respect to the sequence length of the vanilla softmax-based attention mechanism becomes the major bottleneck for the application of long sequence tasks, such as vision tasks. Although various efficient linear attention mechanisms have been proposed, they nee… ▽ More The attention mechanism is the key to the success of transformers in different machine learning tasks. However, the quadratic complexity with respect to the sequence length of the vanilla softmax-based attention mechanism becomes the major bottleneck for the application of long sequence tasks, such as vision tasks. Although various efficient linear attention mechanisms have been proposed, they need to sacrifice performance to achieve high efficiency. What's more, memory-efficient methods, such as FlashAttention-1-3, still have quadratic computation complexity which can be further improved. In this paper, we propose a novel efficient linear fast attention (ELFATT) mechanism to achieve low memory input/output operations, linear computational complexity, and high performance at the same time. ELFATT offers 4-7x speedups over the vanilla softmax-based attention mechanism in high-resolution vision tasks without losing performance. ELFATT is FlashAttention friendly. Using FlashAttention-2 acceleration, ELFATT still offers 2-3x speedups over the vanilla softmax-based attention mechanism on high-resolution vision tasks without losing performance. Even in some non-vision tasks of long-range arena, ELFATT still achieves leading performance and offers 1.2-2.3x speedups over FlashAttention-2. Even on edge GPUs, ELFATT still offers 1.6x to 2.0x speedups compared to state-of-the-art attention mechanisms in various power modes from 5W to 60W. Furthermore, ELFATT can be used to enhance and accelerate diffusion tasks directly without training. △ Less

Submitted 2 August, 2025; v1 submitted 10 January, 2025; originally announced January 2025.

Comments: Accepted by ACM International Conference on Multimedia (MM '25) [The version includes Appendix]

arXiv:2501.00378 [pdf, ps, other]

doi 10.1016/j.neunet.2025.107927

STARFormer: A Novel Spatio-Temporal Aggregation Reorganization Transformer of FMRI for Brain Disorder Diagnosis

Authors: Wenhao Dong, Yueyang Li, Weiming Zeng, Lei Chen, Hongjie Yan, Wai Ting Siok, Nizhuan Wang

Abstract: Many existing methods that use functional magnetic resonance imaging (fMRI) classify brain disorders, such as autism spectrum disorder (ASD) and attention deficit hyperactivity disorder (ADHD), often overlook the integration of spatial and temporal dependencies of the blood oxygen level-dependent (BOLD) signals, which may lead to inaccurate or imprecise classification results. To solve this proble… ▽ More Many existing methods that use functional magnetic resonance imaging (fMRI) classify brain disorders, such as autism spectrum disorder (ASD) and attention deficit hyperactivity disorder (ADHD), often overlook the integration of spatial and temporal dependencies of the blood oxygen level-dependent (BOLD) signals, which may lead to inaccurate or imprecise classification results. To solve this problem, we propose a Spatio-Temporal Aggregation eorganization ransformer (STARFormer) that effectively captures both spatial and temporal features of BOLD signals by incorporating three key modules. The region of interest (ROI) spatial structure analysis module uses eigenvector centrality (EC) to reorganize brain regions based on effective connectivity, highlighting critical spatial relationships relevant to the brain disorder. The temporal feature reorganization module systematically segments the time series into equal-dimensional window tokens and captures multiscale features through variable window and cross-window attention. The spatio-temporal feature fusion module employs a parallel transformer architecture with dedicated temporal and spatial branches to extract integrated features. The proposed STARFormer has been rigorously evaluated on two publicly available datasets for the classification of ASD and ADHD. The experimental results confirm that the STARFormer achieves state-of-the-art performance across multiple evaluation metrics, providing a more accurate and reliable tool for the diagnosis of brain disorders and biomedical research. The codes are available at: https://github.com/NZWANG/STARFormer. △ Less

Submitted 7 August, 2025; v1 submitted 31 December, 2024; originally announced January 2025.

Journal ref: Neural Networks, 2025

arXiv:2412.12651 [pdf, other]

Shared Attention-based Autoencoder with Hierarchical Fusion-based Graph Convolution Network for sEEG SOZ Identification

Authors: Huachao Yan, Kailing Guo, Shiwei Song, Yihai Dai, Xiaoqiang Wei, Xiaofen Xing, Xiangmin Xu

Abstract: Diagnosing seizure onset zone (SOZ) is a challenge in neurosurgery, where stereoelectroencephalography (sEEG) serves as a critical technique. In sEEG SOZ identification, the existing studies focus solely on the intra-patient representation of epileptic information, overlooking the general features of epilepsy across patients and feature interdependencies between feature elements in each contact si… ▽ More Diagnosing seizure onset zone (SOZ) is a challenge in neurosurgery, where stereoelectroencephalography (sEEG) serves as a critical technique. In sEEG SOZ identification, the existing studies focus solely on the intra-patient representation of epileptic information, overlooking the general features of epilepsy across patients and feature interdependencies between feature elements in each contact site. In order to address the aforementioned challenges, we propose the shared attention-based autoencoder (sATAE). sATAE is trained by sEEG data across all patients, with attention blocks introduced to enhance the representation of interdependencies between feature elements. Considering the spatial diversity of sEEG across patients, we introduce graph-based method for identification SOZ of each patient. However, the current graph-based methods for sEEG SOZ identification rely exclusively on static graphs to model epileptic networks. Inspired by the finding of neuroscience that epileptic network is intricately characterized by the interplay of sophisticated equilibrium between fluctuating and stable states, we design the hierarchical fusion-based graph convolution network (HFGCN) to identify the SOZ. HFGCN integrates the dynamic and static characteristics of epileptic networks through hierarchical weighting across different hierarchies, facilitating a more comprehensive learning of epileptic features and enriching node information for sEEG SOZ identification. Combining sATAE and HFGCN, we perform comprehensive experiments with sATAE-HFGCN on the self-build sEEG dataset, which includes sEEG data from 17 patients with temporal lobe epilepsy. The results show that our method, sATAE-HFGCN, achieves superior performance for identifying the SOZ of each patient, effectively addressing the aforementioned challenges, providing an efficient solution for sEEG-based SOZ identification. △ Less

Submitted 17 December, 2024; originally announced December 2024.

arXiv:2412.00999 [pdf]

A Compact Hybrid Battery Thermal Management System for Enhanced Cooling

Authors: Zhipeng Lyu, Jinrong Su, Zhe Li, Xiang Li, Hanghang Yan, Lei Chen

Abstract: Hybrid battery thermal management systems (HBTMS) combining active liquid cooling and passive phase change materials (PCM) cooling have shown a potential for the thermal management of lithium-ion batteries. However, the fill volume of coolant and PCM in hybrid cooling systems is limited by the size and weight of the HBTMS at high charge/discharge rates. These limitations result in reduced convecti… ▽ More Hybrid battery thermal management systems (HBTMS) combining active liquid cooling and passive phase change materials (PCM) cooling have shown a potential for the thermal management of lithium-ion batteries. However, the fill volume of coolant and PCM in hybrid cooling systems is limited by the size and weight of the HBTMS at high charge/discharge rates. These limitations result in reduced convective heat transfer from the coolant during discharge. The liquefaction rate of PCM is accelerated and the passive cooling effect is reduced. In this paper, we propose a compact hybrid cooling system with multi-inlet U-shaped microchannels for which the gap between channels is embedded by PCM/aluminum foam for compactness. Nanofluid cooling (NC) technology with better thermal conductivity is used. A pulsed flow function is further developed for enhanced cooling (EC) with reduced power consumption. An experimentally validated thermal-fluid dynamics model is developed to optimize operating conditions including coolant type, cooling direction, channel height, inlet flow rate, and cooling scheme. The results show that the hybrid cooling solution of NC+PCM+EC adopted by HBTMS further reduces the maximum temperature of the Li-ion battery by 3.44°C under a discharge rate of 1C at room temperature of 25°C with only a 5% increase in power consumption, compared to the conventional liquid cooling method for electric vehicles (EV). The average number of battery charges has increased by about 6 to 15 percent. The results of this study can help improve the range as well as driving safety of new energy EV. △ Less

Submitted 1 December, 2024; originally announced December 2024.

arXiv:2411.17734 [pdf]

A Low-Cost Monopulse Receiver with Enhanced Estimation Accuracy Via Deep Neural Network

Authors: Hanxiang Zhang, Saeed Zolfaghary Pour, Hao Yan, Powei Liu, Bayaner Arigong

Abstract: In this paper, a low-cost monopulse receiver with an enhanced direction of arrival (DoA) estimation accuracy via deep neural network (DNN) is proposed. The entire system is composed of a 4-element patch array, a fully planar symmetrical monopulse comparator network, and a down conversion link. Unlike the conventional design topology, the proposed monopulse comparator network is configured by four… ▽ More In this paper, a low-cost monopulse receiver with an enhanced direction of arrival (DoA) estimation accuracy via deep neural network (DNN) is proposed. The entire system is composed of a 4-element patch array, a fully planar symmetrical monopulse comparator network, and a down conversion link. Unlike the conventional design topology, the proposed monopulse comparator network is configured by four novel port-transformation rat-race couplers. In specific, the proposed coupler is designed to symmetrically allocate the sum (Σ) / delta (Δ) ports with input ports, where a 360° phase delay crossover is designed to transform the unsymmetrical ports in the conventional rat-race coupler. This new rat-race coupler resolves the issues in conventional monopulse receiver comparator network design using multilayer and expensive fabrication technology. To verify the design theory, a prototype of the proposed planar monopulse comparator network operating at 2 GHz is designed, simulated, and measured. In addition, the monopulse radiation patterns and direction of arrival are also decently evaluated. To further boost the accuracy of angular information, a deep neural network is introduced to map the misaligned target angular positions in the measurement to the actual physical location under detection. △ Less

Submitted 22 November, 2024; originally announced November 2024.

arXiv:2411.12448 [pdf, other]

Large Language Models for Lossless Image Compression: Next-Pixel Prediction in Language Space is All You Need

Authors: Kecheng Chen, Pingping Zhang, Hui Liu, Jie Liu, Yibing Liu, Jiaxin Huang, Shiqi Wang, Hong Yan, Haoliang Li

Abstract: We have recently witnessed that ``Intelligence" and `` Compression" are the two sides of the same coin, where the language large model (LLM) with unprecedented intelligence is a general-purpose lossless compressor for various data modalities. This attribute particularly appeals to the lossless image compression community, given the increasing need to compress high-resolution images in the current… ▽ More We have recently witnessed that ``Intelligence" and `` Compression" are the two sides of the same coin, where the language large model (LLM) with unprecedented intelligence is a general-purpose lossless compressor for various data modalities. This attribute particularly appeals to the lossless image compression community, given the increasing need to compress high-resolution images in the current streaming media era. Consequently, a spontaneous envision emerges: Can the compression performance of the LLM elevate lossless image compression to new heights? However, our findings indicate that the naive application of LLM-based lossless image compressors suffers from a considerable performance gap compared with existing state-of-the-art (SOTA) codecs on common benchmark datasets. In light of this, we are dedicated to fulfilling the unprecedented intelligence (compression) capacity of the LLM for lossless image compression tasks, thereby bridging the gap between theoretical and practical compression performance. Specifically, we propose P$^{2}$-LLM, a next-pixel prediction-based LLM, which integrates various elaborated insights and methodologies, \textit{e.g.,} pixel-level priors, the in-context ability of LLM, and a pixel-level semantic preservation strategy, to enhance the understanding capacity of pixel sequences for better next-pixel predictions. Extensive experiments on benchmark datasets demonstrate that P$^{2}$-LLM can beat SOTA classical and learned codecs. △ Less

Submitted 21 November, 2024; v1 submitted 19 November, 2024; originally announced November 2024.

arXiv:2409.13285 [pdf, other]

LiSenNet: Lightweight Sub-band and Dual-Path Modeling for Real-Time Speech Enhancement

Authors: Haoyin Yan, Jie Zhang, Cunhang Fan, Yeping Zhou, Peiqi Liu

Abstract: Speech enhancement (SE) aims to extract the clean waveform from noise-contaminated measurements to improve the speech quality and intelligibility. Although learning-based methods can perform much better than traditional counterparts, the large computational complexity and model size heavily limit the deployment on latency-sensitive and low-resource edge devices. In this work, we propose a lightwei… ▽ More Speech enhancement (SE) aims to extract the clean waveform from noise-contaminated measurements to improve the speech quality and intelligibility. Although learning-based methods can perform much better than traditional counterparts, the large computational complexity and model size heavily limit the deployment on latency-sensitive and low-resource edge devices. In this work, we propose a lightweight SE network (LiSenNet) for real-time applications. We design sub-band downsampling and upsampling blocks and a dual-path recurrent module to capture band-aware features and time-frequency patterns, respectively. A noise detector is developed to detect noisy regions in order to perform SE adaptively and save computational costs. Compared to recent higher-resource-dependent baseline models, the proposed LiSenNet can achieve a competitive performance with only 37k parameters (half of the state-of-the-art model) and 56M multiply-accumulate (MAC) operations per second. △ Less

Submitted 20 September, 2024; originally announced September 2024.

Comments: 5 pages, submitted to 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025)

arXiv:2409.05113 [pdf, other]

Nonlinear Cooperative Output Regulation with Input Delay Compensation

Authors: Shiqi Zheng, Choon Ki Ahn, Xiaowei Jiang, Huaicheng Yan, Peng Shi

Abstract: This paper investigates the cooperative output regulation (COR) of nonlinear multi-agent systems (MASs) with long input delay based on periodic event-triggered mechanism. Compared with other mechanisms, periodic event-triggered control can automatically guarantee a Zeno-free behavior and avoid the continuous monitoring of triggered conditions. First, a new periodic event-triggered distributed obse… ▽ More This paper investigates the cooperative output regulation (COR) of nonlinear multi-agent systems (MASs) with long input delay based on periodic event-triggered mechanism. Compared with other mechanisms, periodic event-triggered control can automatically guarantee a Zeno-free behavior and avoid the continuous monitoring of triggered conditions. First, a new periodic event-triggered distributed observer, which is based on the fully asynchronous communication data, is proposed to estimate the leader information. Second, a new distributed predictor feedback control method is proposed for the considered nonlinear MASs with input delay. By coordinate transformation, the MASs are mapped into new coupled ODE-PDE target systems with some disturbance-like terms. Then, we show that the COR problem is solvable. At last, to further save the communication resource, a periodic event-triggered mechanism is considered in the sensor-to-controller transmission in every agent. A new periodic event-triggered filter is proposed to deal with the periodic event-triggered feedback data. The MASs with input delay are mapped into coupled ODE-PDE target systems with sampled data information. Then, Lyapunov-Krasovskii functions are constructed to demonstrate the exponential stability of the MASs. Simulations verify the validity of the proposed results. △ Less

Submitted 8 September, 2024; originally announced September 2024.

Comments: Acceptted by IEEE Trans. Automatic Control

arXiv:2407.21323 [pdf]

doi 10.3390/tomography10120138

STANet: A Novel Spatio-Temporal Aggregation Network for Depression Classification with Small and Unbalanced FMRI Data

Authors: Wei Zhang, Weiming Zeng, Hongyu Chen, Jie Liu, Hongjie Yan, Kaile Zhang, Ran Tao, Wai Ting Siok, Nizhuan Wang

Abstract: Accurate diagnosis of depression is crucial for timely implementation of optimal treatments, preventing complications and reducing the risk of suicide. Traditional methods rely on self-report questionnaires and clinical assessment, lacking objective biomarkers. Combining fMRI with artificial intelligence can enhance depression diagnosis by integrating neuroimaging indicators. However, the specific… ▽ More Accurate diagnosis of depression is crucial for timely implementation of optimal treatments, preventing complications and reducing the risk of suicide. Traditional methods rely on self-report questionnaires and clinical assessment, lacking objective biomarkers. Combining fMRI with artificial intelligence can enhance depression diagnosis by integrating neuroimaging indicators. However, the specificity of fMRI acquisition for depression often results in unbalanced and small datasets, challenging the sensitivity and accuracy of classification models. In this study, we propose the Spatio-Temporal Aggregation Network (STANet) for diagnosing depression by integrating CNN and RNN to capture both temporal and spatial features of brain activity. STANet comprises the following steps:(1) Aggregate spatio-temporal information via ICA. (2) Utilize multi-scale deep convolution to capture detailed features. (3) Balance data using the SMOTE to generate new samples for minority classes. (4) Employ the AFGRU classifier, which combines Fourier transformation with GRU, to capture long-term dependencies, with an adaptive weight assignment mechanism to enhance model generalization. The experimental results demonstrate that STANet achieves superior depression diagnostic performance with 82.38% accuracy and a 90.72% AUC. The STFA module enhances classification by capturing deeper features at multiple scales. The AFGRU classifier, with adaptive weights and stacked GRU, attains higher accuracy and AUC. SMOTE outperforms other oversampling methods. Additionally, spatio-temporal aggregated features achieve better performance compared to using only temporal or spatial features. STANet outperforms traditional or deep learning classifiers, and functional connectivity-based classifiers, as demonstrated by ten-fold cross-validation. △ Less

Submitted 28 November, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

Comments: This paper is published on Tomography

Journal ref: Tomography,2024

arXiv:2407.13211 [pdf]

Research on Image Super-Resolution Reconstruction Mechanism based on Convolutional Neural Network

Authors: Hao Yan, Zixiang Wang, Zhengjia Xu, Zhuoyue Wang, Zhizhong Wu, Ranran Lyu

Abstract: Super-resolution reconstruction techniques entail the utilization of software algorithms to transform one or more sets of low-resolution images captured from the same scene into high-resolution images. In recent years, considerable advancement has been observed in the domain of single-image super-resolution algorithms, particularly those based on deep learning techniques. Nevertheless, the extract… ▽ More Super-resolution reconstruction techniques entail the utilization of software algorithms to transform one or more sets of low-resolution images captured from the same scene into high-resolution images. In recent years, considerable advancement has been observed in the domain of single-image super-resolution algorithms, particularly those based on deep learning techniques. Nevertheless, the extraction of image features and nonlinear mapping methods in the reconstruction process remain challenging for existing algorithms. These issues result in the network architecture being unable to effectively utilize the diverse range of information at different levels. The loss of high-frequency details is significant, and the final reconstructed image features are overly smooth, with a lack of fine texture details. This negatively impacts the subjective visual quality of the image. The objective is to recover high-quality, high-resolution images from low-resolution images. In this work, an enhanced deep convolutional neural network model is employed, comprising multiple convolutional layers, each of which is configured with specific filters and activation functions to effectively capture the diverse features of the image. Furthermore, a residual learning strategy is employed to accelerate training and enhance the convergence of the network, while sub-pixel convolutional layers are utilized to refine the high-frequency details and textures of the image. The experimental analysis demonstrates the superior performance of the proposed model on multiple public datasets when compared with the traditional bicubic interpolation method and several other learning-based super-resolution methods. Furthermore, it proves the model's efficacy in maintaining image edges and textures. △ Less

Submitted 31 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

arXiv:2407.05605 [pdf, other]

doi 10.1109/ICASSP43922.2022.9746163

Two-Path GMM-ResNet and GMM-SENet for ASV Spoofing Detection

Authors: Zhenchun Lei, Hui Yan, Changhong Liu, Minglei Ma, Yingen Yang

Abstract: The automatic speaker verification system is sometimes vulnerable to various spoofing attacks. The 2-class Gaussian Mixture Model classifier for genuine and spoofed speech is usually used as the baseline for spoofing detection. However, the GMM classifier does not separately consider the scores of feature frames on each Gaussian component. In addition, the GMM accumulates the scores on all frames… ▽ More The automatic speaker verification system is sometimes vulnerable to various spoofing attacks. The 2-class Gaussian Mixture Model classifier for genuine and spoofed speech is usually used as the baseline for spoofing detection. However, the GMM classifier does not separately consider the scores of feature frames on each Gaussian component. In addition, the GMM accumulates the scores on all frames independently, and does not consider their correlations. We propose the two-path GMM-ResNet and GMM-SENet models for spoofing detection, whose input is the Gaussian probability features based on two GMMs trained on genuine and spoofed speech respectively. The models consider not only the score distribution on GMM components, but also the relationship between adjacent frames. A two-step training scheme is applied to improve the system robustness. Experiments on the ASVspoof 2019 show that the LFCC+GMM-ResNet system can relatively reduce min-tDCF and EER by 76.1% and 76.3% on logical access scenario compared with the GMM, and the LFCC+GMM-SENet system by 94.4% and 95.4% on physical access scenario. After score fusion, the systems give the second-best results on both scenarios. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.03135 [pdf, other]

doi 10.1109/ICASSP48485.2024.10447141

GMM-ResNext: Combining Generative and Discriminative Models for Speaker Verification

Authors: Hui Yan, Zhenchun Lei, Changhong Liu, Yong Zhou

Abstract: With the development of deep learning, many different network architectures have been explored in speaker verification. However, most network architectures rely on a single deep learning architecture, and hybrid networks combining different architectures have been little studied in ASV tasks. In this paper, we propose the GMM-ResNext model for speaker verification. Conventional GMM does not consid… ▽ More With the development of deep learning, many different network architectures have been explored in speaker verification. However, most network architectures rely on a single deep learning architecture, and hybrid networks combining different architectures have been little studied in ASV tasks. In this paper, we propose the GMM-ResNext model for speaker verification. Conventional GMM does not consider the score distribution of each frame feature over all Gaussian components and ignores the relationship between neighboring speech frames. So, we extract the log Gaussian probability features based on the raw acoustic features and use ResNext-based network as the backbone to extract the speaker embedding. GMM-ResNext combines Generative and Discriminative Models to improve the generalization ability of deep learning models and allows one to more easily specify meaningful priors on model parameters. A two-path GMM-ResNext model based on two gender-related GMMs has also been proposed. The Experimental results show that the proposed GMM-ResNext achieves relative improvements of 48.1\% and 11.3\% in EER compared with ResNet34 and ECAPA-TDNN on VoxCeleb1-O test set. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02170 [pdf, other]

doi 10.1109/ICASSP48485.2024.10447628

GMM-ResNet2: Ensemble of Group ResNet Networks for Synthetic Speech Detection

Authors: Zhenchun Lei, Hui Yan, Changhong Liu, Yong Zhou, Minglei Ma

Abstract: Deep learning models are widely used for speaker recognition and spoofing speech detection. We propose the GMM-ResNet2 for synthesis speech detection. Compared with the previous GMM-ResNet model, GMM-ResNet2 has four improvements. Firstly, the different order GMMs have different capabilities to form smooth approximations to the feature distribution, and multiple GMMs are used to extract multi-scal… ▽ More Deep learning models are widely used for speaker recognition and spoofing speech detection. We propose the GMM-ResNet2 for synthesis speech detection. Compared with the previous GMM-ResNet model, GMM-ResNet2 has four improvements. Firstly, the different order GMMs have different capabilities to form smooth approximations to the feature distribution, and multiple GMMs are used to extract multi-scale Log Gaussian Probability features. Secondly, the grouping technique is used to improve the classification accuracy by exposing the group cardinality while reducing both the number of parameters and the training time. The final score is obtained by ensemble of all group classifier outputs using the averaging method. Thirdly, the residual block is improved by including one activation function and one batch normalization layer. Finally, an ensemble-aware loss function is proposed to integrate the independent loss functions of all ensemble members. On the ASVspoof 2019 LA task, the GMM-ResNet2 achieves a minimum t-DCF of 0.0227 and an EER of 0.79\%. On the ASVspoof 2021 LA task, the GMM-ResNet2 achieves a minimum t-DCF of 0.2362 and an EER of 2.19\%, and represents a relative reductions of 31.4\% and 76.3\% compared with the LFCC-LCNN baseline. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.02052 [pdf, other]

The USTC-NERCSLIP Systems for The ICMC-ASR Challenge

Authors: Minghui Wu, Luzhen Xu, Jie Zhang, Haitao Tang, Yanyan Yue, Ruizhi Liao, Jintao Zhao, Zhengzhe Zhang, Yichi Wang, Haoyin Yan, Hongliang Yu, Tongle Ma, Jiachen Liu, Chongliang Wu, Yongchao Li, Yanyong Zhang, Xin Fang, Yue Zhang

Abstract: This report describes the submitted system to the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) challenge, which considers the ASR task with multi-speaker overlapping and Mandarin accent dynamics in the ICMC case. We implement the front-end speaker diarization using the self-supervised learning representation based multi-speaker embedding and beamforming using the speaker position,… ▽ More This report describes the submitted system to the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) challenge, which considers the ASR task with multi-speaker overlapping and Mandarin accent dynamics in the ICMC case. We implement the front-end speaker diarization using the self-supervised learning representation based multi-speaker embedding and beamforming using the speaker position, respectively. For ASR, we employ an iterative pseudo-label generation method based on fusion model to obtain text labels of unsupervised data. To mitigate the impact of accent, an Accent-ASR framework is proposed, which captures pronunciation-related accent features at a fine-grained level and linguistic information at a coarse-grained level. On the ICMC-ASR eval set, the proposed system achieves a CER of 13.16% on track 1 and a cpCER of 21.48% on track 2, which significantly outperforms the official baseline system and obtains the first rank on both tracks. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Accepted at ICASSP 2024

arXiv:2406.12707 [pdf, other]

Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction

Authors: Haoqiu Yan, Yongxin Zhu, Kai Zheng, Bing Liu, Haoyu Cao, Deqiang Jiang, Linli Xu

Abstract: Large Language Model (LLM)-enhanced agents become increasingly prevalent in Human-AI communication, offering vast potential from entertainment to professional domains. However, current multi-modal dialogue systems overlook the acoustic information present in speech, which is crucial for understanding human communication nuances. This oversight can lead to misinterpretations of speakers' intentions… ▽ More Large Language Model (LLM)-enhanced agents become increasingly prevalent in Human-AI communication, offering vast potential from entertainment to professional domains. However, current multi-modal dialogue systems overlook the acoustic information present in speech, which is crucial for understanding human communication nuances. This oversight can lead to misinterpretations of speakers' intentions, resulting in inconsistent or even contradictory responses within dialogues. To bridge this gap, in this paper, we propose PerceptiveAgent, an empathetic multi-modal dialogue system designed to discern deeper or more subtle meanings beyond the literal interpretations of words through the integration of speech modality perception. Employing LLMs as a cognitive core, PerceptiveAgent perceives acoustic information from input speech and generates empathetic responses based on speaking styles described in natural language. Experimental results indicate that PerceptiveAgent excels in contextual understanding by accurately discerning the speakers' true intentions in scenarios where the linguistic meaning is either contrary to or inconsistent with the speaker's true feelings, producing more nuanced and expressive spoken dialogues. Code is publicly available at: \url{https://github.com/Haoqiu-Yan/PerceptiveAgent}. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 9 pages, 3 figures, ACL24 accepted

arXiv:2406.11799 [pdf, other]

Mix-Domain Contrastive Learning for Unpaired H&E-to-IHC Stain Translation

Authors: Song Wang, Zhong Zhang, Huan Yan, Ming Xu, Guanghui Wang

Abstract: H&E-to-IHC stain translation techniques offer a promising solution for precise cancer diagnosis, especially in low-resource regions where there is a shortage of health professionals and limited access to expensive equipment. Considering the pixel-level misalignment of H&E-IHC image pairs, current research explores the pathological consistency between patches from the same positions of the image pa… ▽ More H&E-to-IHC stain translation techniques offer a promising solution for precise cancer diagnosis, especially in low-resource regions where there is a shortage of health professionals and limited access to expensive equipment. Considering the pixel-level misalignment of H&E-IHC image pairs, current research explores the pathological consistency between patches from the same positions of the image pair. However, most of them overemphasize the correspondence between domains or patches, overlooking the side information provided by the non-corresponding objects. In this paper, we propose a Mix-Domain Contrastive Learning (MDCL) method to leverage the supervision information in unpaired H&E-to-IHC stain translation. Specifically, the proposed MDCL method aggregates the inter-domain and intra-domain pathology information by estimating the correlation between the anchor patch and all the patches from the matching images, encouraging the network to learn additional contrastive knowledge from mixed domains. With the mix-domain pathology information aggregation, MDCL enhances the pathological consistency between the corresponding patches and the component discrepancy of the patches from the different positions of the generated IHC image. Extensive experiments on two H&E-to-IHC stain translation datasets, namely MIST and BCI, demonstrate that the proposed method achieves state-of-the-art performance across multiple metrics. △ Less

Submitted 30 August, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

arXiv:2405.11352 [pdf, other]

Hierarchical Reinforcement Learning Empowered Task Offloading in V2I Networks

Authors: Xinyu You, Haojie Yan, Yuedong Xu, Lifeng Wang, Liangui Dai

Abstract: Edge computing plays an essential role in the vehicle-to-infrastructure (V2I) networks, where vehicles offload their intensive computation tasks to the road-side units for saving energy and reduce the latency. This paper designs the optimal task offloading policy to address the concerns involving processing delay, energy consumption and edge computing cost. Each computation task consisting of some… ▽ More Edge computing plays an essential role in the vehicle-to-infrastructure (V2I) networks, where vehicles offload their intensive computation tasks to the road-side units for saving energy and reduce the latency. This paper designs the optimal task offloading policy to address the concerns involving processing delay, energy consumption and edge computing cost. Each computation task consisting of some interdependent sub-tasks is characterized as a directed acyclic graph (DAG). In such dynamic networks, a novel hierarchical Offloading scheme is proposed by leveraging deep reinforcement learning (DRL). The inter-dependencies among the DAGs of the computation tasks are extracted using a graph neural network with attention mechanism. A parameterized DRL algorithm is developed to deal with the hierarchical action space containing both discrete and continuous actions. Simulation results with a real-world car speed dataset demonstrate that the proposed scheme can effectively reduce the system overhead. △ Less

Submitted 18 May, 2024; originally announced May 2024.

arXiv:2403.06460 [pdf, other]

RIS-Enabled Joint Near-Field 3D Localization and Synchronization in SISO Multipath Environments

Authors: Han Yan, Hua Chen, Wei Liu, Songjie Yang, Gang Wang, Chau Yuen

Abstract: Reconfigurable Intelligent Surfaces (RIS) show great promise in the realm of 6th generation (6G) wireless systems, particularly in the areas of localization and communication. Their cost-effectiveness and energy efficiency enable the integration of numerous passive and reflective elements, enabling near-field propagation. In this paper, we tackle the challenges of RIS-aided 3D localization and syn… ▽ More Reconfigurable Intelligent Surfaces (RIS) show great promise in the realm of 6th generation (6G) wireless systems, particularly in the areas of localization and communication. Their cost-effectiveness and energy efficiency enable the integration of numerous passive and reflective elements, enabling near-field propagation. In this paper, we tackle the challenges of RIS-aided 3D localization and synchronization in multipath environments, focusing on the near-field of mmWave systems. Specifically, our approach involves formulating a maximum likelihood (ML) estimation problem for the channel parameters. To initiate this process, we leverage a combination of canonical polyadic decomposition (CPD) and orthogonal matching pursuit (OMP) to obtain coarse estimates of the time of arrival (ToA) and angle of departure (AoD) under the far-field approximation. Subsequently, distances are estimated using $l_{1}$-regularization based on a near-field model. Additionally, we introduce a refinement phase employing the spatial alternating generalized expectation maximization (SAGE) algorithm. Finally, a weighted least squares approach is applied to convert channel parameters into position and clock offset estimates. To extend the estimation algorithm to ultra-large (UL) RIS-assisted localization scenarios, it is further enhanced to reduce errors associated with far-field approximations, especially in the presence of significant near-field effects, achieved by narrowing the RIS aperture. Moreover, the Cramér-Rao Bound (CRB) is derived and the RIS phase shifts are optimized to improve the positioning accuracy. Numerical results affirm the efficacy of the proposed estimation algorithm. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.04145

A Crosstalk-Aware Timing Prediction Method in Routing

Authors: Leilei Jin, Jiajie Xu, Wenjie Fu, Hao Yan, Longxing Shi

Abstract: With shrinking interconnect spacing in advanced technology nodes, existing timing predictions become less precise due to the challenging quantification of crosstalk-induced delay. During the routing, the crosstalk effect is typically modeled by predicting coupling capacitance with congestion information. However, the timing estimation tends to be overly pessimistic, as the crosstalk-induced delay… ▽ More With shrinking interconnect spacing in advanced technology nodes, existing timing predictions become less precise due to the challenging quantification of crosstalk-induced delay. During the routing, the crosstalk effect is typically modeled by predicting coupling capacitance with congestion information. However, the timing estimation tends to be overly pessimistic, as the crosstalk-induced delay depends not only on the coupling capacitance but also on the signal arrival time. This work presents a crosstalk-aware timing estimation method using a two-step machine learning approach. Interconnects that are physically adjacent and overlap in signal timing windows are filtered first. Crosstalk delay is predicted by integrating physical topology and timing features without relying on post-routing results and the parasitic extraction. Experimental results show a match rate of over 99% for identifying crosstalk nets compared to the commercial tool on the OpenCores benchmarks, with prediction results being more accurate than those of other state-of-the-art methods. △ Less

Submitted 9 April, 2025; v1 submitted 6 March, 2024; originally announced March 2024.

Comments: I would like to withdraw my submission because the work is incomplete. In particular, the calculations related to crosstalk need to be reconsidered. I plan to revise and improve the manuscript further

ACM Class: I.6.4; B.7.3

arXiv:2401.11675 [pdf, ps, other]

ATFusion: An Alternate Cross-Attention Transformer Network for Infrared and Visible Image Fusion

Authors: Han Yan, Songlei Xiong, Long Wang, Lihua Jian, Gemine Vivone

Abstract: The fusion of infrared and visible images is essential in remote sensing applications, as it combines the thermal information of infrared images with the detailed texture of visible images for more accurate analysis in tasks like environmental monitoring, target detection, and disaster management. The current fusion methods based on Transformer techniques for infrared and visible (IV) images have… ▽ More The fusion of infrared and visible images is essential in remote sensing applications, as it combines the thermal information of infrared images with the detailed texture of visible images for more accurate analysis in tasks like environmental monitoring, target detection, and disaster management. The current fusion methods based on Transformer techniques for infrared and visible (IV) images have exhibited promising performance. However, the attention mechanism of the previous Transformer-based methods was prone to extract common information from source images without considering the discrepancy information, which limited fusion performance. In this paper, by reevaluating the cross-attention mechanism, we propose an alternate Transformer fusion network (ATFusion) to fuse IV images. Our ATFusion consists of one discrepancy information injection module (DIIM) and two alternate common information injection modules (ACIIM). The DIIM is designed by modifying the vanilla cross-attention mechanism, which can promote the extraction of the discrepancy information of the source images. Meanwhile, the ACIIM is devised by alternately using the vanilla cross-attention mechanism, which can fully mine common information and integrate long dependencies. Moreover, the successful training of ATFusion is facilitated by a proposed segmented pixel loss function, which provides a good trade-off for texture detail and salient structure preservation. The qualitative and quantitative results on public datasets indicate our ATFusion is effective and superior compared to other state-of-the-art methods. △ Less

Submitted 17 June, 2025; v1 submitted 21 January, 2024; originally announced January 2024.

Comments: v2 Update: Enhanced version with improved model stability (new augmentation/regularization), added benchmarks (SwinFuse/AEFusion/LRRNet, FMI/SSIM), and RGB-NIR dataset. Updated authorship (H.Yan first, L.Jian corresponding). Under review at Infrared Physics & Technology. Original: arXiv:2401.11675v1

arXiv:2310.09937 [pdf, other]

Joint Sparse Representations and Coupled Dictionary Learning in Multi-Source Heterogeneous Image Pseudo-color Fusion

Authors: Long Bai, Shilong Yao, Kun Gao, Yanjun Huang, Ruijie Tang, Hong Yan, Max Q. -H. Meng, Hongliang Ren

Abstract: Considering that Coupled Dictionary Learning (CDL) method can obtain a reasonable linear mathematical relationship between resource images, we propose a novel CDL-based Synthetic Aperture Radar (SAR) and multispectral pseudo-color fusion method. Firstly, the traditional Brovey transform is employed as a pre-processing method on the paired SAR and multispectral images. Then, CDL is used to capture… ▽ More Considering that Coupled Dictionary Learning (CDL) method can obtain a reasonable linear mathematical relationship between resource images, we propose a novel CDL-based Synthetic Aperture Radar (SAR) and multispectral pseudo-color fusion method. Firstly, the traditional Brovey transform is employed as a pre-processing method on the paired SAR and multispectral images. Then, CDL is used to capture the correlation between the pre-processed image pairs based on the dictionaries generated from the source images via enforced joint sparse coding. Afterward, the joint sparse representation in the pair of dictionaries is utilized to construct an image mask via calculating the reconstruction errors, and therefore generate the final fusion image. The experimental verification results of the SAR images from the Sentinel-1 satellite and the multispectral images from the Landsat-8 satellite show that the proposed method can achieve superior visual effects, and excellent quantitative performance in terms of spectral distortion, correlation coefficient, MSE, NIQE, BRISQUE, and PIQE. △ Less

Submitted 15 October, 2023; originally announced October 2023.

Comments: To appear in IEEE Sensors Journal

arXiv:2306.10275 [pdf, other]

Multi-Scale Simulation of Complex Systems: A Perspective of Integrating Knowledge and Data

Authors: Huandong Wang, Huan Yan, Can Rong, Yuan Yuan, Fenyu Jiang, Zhenyu Han, Hongjie Sui, Depeng Jin, Yong Li

Abstract: Complex system simulation has been playing an irreplaceable role in understanding, predicting, and controlling diverse complex systems. In the past few decades, the multi-scale simulation technique has drawn increasing attention for its remarkable ability to overcome the challenges of complex system simulation with unknown mechanisms and expensive computational costs. In this survey, we will syste… ▽ More Complex system simulation has been playing an irreplaceable role in understanding, predicting, and controlling diverse complex systems. In the past few decades, the multi-scale simulation technique has drawn increasing attention for its remarkable ability to overcome the challenges of complex system simulation with unknown mechanisms and expensive computational costs. In this survey, we will systematically review the literature on multi-scale simulation of complex systems from the perspective of knowledge and data. Firstly, we will present background knowledge about simulating complex system simulation and the scales in complex systems. Then, we divide the main objectives of multi-scale modeling and simulation into five categories by considering scenarios with clear scale and scenarios with unclear scale, respectively. After summarizing the general methods for multi-scale simulation based on the clues of knowledge and data, we introduce the adopted methods to achieve different objectives. Finally, we introduce the applications of multi-scale simulation in typical matter systems and social systems. △ Less

Submitted 23 July, 2024; v1 submitted 17 June, 2023; originally announced June 2023.

ACM Class: I.2; I.6; J.2; J.4

arXiv:2303.06543 [pdf, other]

MetaUE: Model-based Meta-learning for Underwater Image Enhancement

Authors: Zhenwei Zhang, Haorui Yan, Ke Tang, Yuping Duan

Abstract: The challenges in recovering underwater images are the presence of diverse degradation factors and the lack of ground truth images. Although synthetic underwater image pairs can be used to overcome the problem of inadequately observing data, it may result in over-fitting and enhancement degradation. This paper proposes a model-based deep learning method for restoring clean images under various und… ▽ More The challenges in recovering underwater images are the presence of diverse degradation factors and the lack of ground truth images. Although synthetic underwater image pairs can be used to overcome the problem of inadequately observing data, it may result in over-fitting and enhancement degradation. This paper proposes a model-based deep learning method for restoring clean images under various underwater scenarios, which exhibits good interpretability and generalization ability. More specifically, we build up a multi-variable convolutional neural network model to estimate the clean image, background light and transmission map, respectively. An efficient loss function is also designed to closely integrate the variables based on the underwater image model. The meta-learning strategy is used to obtain a pre-trained model on the synthetic underwater dataset, which contains different types of degradation to cover the various underwater environments. The pre-trained model is then fine-tuned on real underwater datasets to obtain a reliable underwater image enhancement model, called MetaUE. Numerical experiments demonstrate that the pre-trained model has good generalization ability, allowing it to remove the color degradation for various underwater attenuation images such as blue, green and yellow, etc. The fine-tuning makes the model able to adapt to different underwater datasets, the enhancement results of which outperform the state-of-the-art underwater image restoration methods. All our codes and data are available at \url{https://github.com/Duanlab123/MetaUE}. △ Less

Submitted 11 March, 2023; originally announced March 2023.

arXiv:2302.13251 [pdf, other]

Unsupervised Domain Adaptation for Low-dose CT Reconstruction via Bayesian Uncertainty Alignment

Authors: Kecheng Chen, Jie Liu, Renjie Wan, Victor Ho-Fun Lee, Varut Vardhanabhuti, Hong Yan, Haoliang Li

Abstract: Low-dose computed tomography (LDCT) image reconstruction techniques can reduce patient radiation exposure while maintaining acceptable imaging quality. Deep learning is widely used in this problem, but the performance of testing data (a.k.a. target domain) is often degraded in clinical scenarios due to the variations that were not encountered in training data (a.k.a. source domain). Unsupervised d… ▽ More Low-dose computed tomography (LDCT) image reconstruction techniques can reduce patient radiation exposure while maintaining acceptable imaging quality. Deep learning is widely used in this problem, but the performance of testing data (a.k.a. target domain) is often degraded in clinical scenarios due to the variations that were not encountered in training data (a.k.a. source domain). Unsupervised domain adaptation (UDA) of LDCT reconstruction has been proposed to solve this problem through distribution alignment. However, existing UDA methods fail to explore the usage of uncertainty quantification, which is crucial for reliable intelligent medical systems in clinical scenarios with unexpected variations. Moreover, existing direct alignment for different patients would lead to content mismatch issues. To address these issues, we propose to leverage a probabilistic reconstruction framework to conduct a joint discrepancy minimization between source and target domains in both the latent and image spaces. In the latent space, we devise a Bayesian uncertainty alignment to reduce the epistemic gap between the two domains. This approach reduces the uncertainty level of target domain data, making it more likely to render well-reconstructed results on target domains. In the image space, we propose a sharpness-aware distribution alignment to achieve a match of second-order information, which can ensure that the reconstructed images from the target domain have similar sharpness to normal-dose CT images from the source domain. Experimental results on two simulated datasets and one clinical low-dose imaging dataset show that our proposed method outperforms other methods in quantitative and visualized performance. △ Less

Submitted 2 June, 2024; v1 submitted 26 February, 2023; originally announced February 2023.

Comments: Accepted by IEEE Transactions on Neural Networks and Learning Systems

arXiv:2211.02419 [pdf, other]

High-Resolution Boundary Detection for Medical Image Segmentation with Piece-Wise Two-Sample T-Test Augmented Loss

Authors: Yucong Lin, Jinhua Su, Yuhang Li, Yuhao Wei, Hanchao Yan, Saining Zhang, Jiaan Luo, Danni Ai, Hong Song, Jingfan Fan, Tianyu Fu, Deqiang Xiao, Feifei Wang, Jue Hou, Jian Yang

Abstract: Deep learning methods have contributed substantially to the rapid advancement of medical image segmentation, the quality of which relies on the suitable design of loss functions. Popular loss functions, including the cross-entropy and dice losses, often fall short of boundary detection, thereby limiting high-resolution downstream applications such as automated diagnoses and procedures. We develope… ▽ More Deep learning methods have contributed substantially to the rapid advancement of medical image segmentation, the quality of which relies on the suitable design of loss functions. Popular loss functions, including the cross-entropy and dice losses, often fall short of boundary detection, thereby limiting high-resolution downstream applications such as automated diagnoses and procedures. We developed a novel loss function that is tailored to reflect the boundary information to enhance the boundary detection. As the contrast between segmentation and background regions along the classification boundary naturally induces heterogeneity over the pixels, we propose the piece-wise two-sample t-test augmented (PTA) loss that is infused with the statistical test for such heterogeneity. We demonstrate the improved boundary detection power of the PTA loss compared to benchmark losses without a t-test component. △ Less

Submitted 4 November, 2022; originally announced November 2022.

arXiv:2210.03883 [pdf, other]

Rethinking the Detection Head Configuration for Traffic Object Detection

Authors: Yi Shi, Jiang Wu, Shixuan Zhao, Gangyao Gao, Tao Deng, Hongmei Yan

Abstract: Multi-scale detection plays an important role in object detection models. However, researchers usually feel blank on how to reasonably configure detection heads combining multi-scale features at different input resolutions. We find that there are different matching relationships between the object distribution and the detection head at different input resolutions. Based on the instructive findings… ▽ More Multi-scale detection plays an important role in object detection models. However, researchers usually feel blank on how to reasonably configure detection heads combining multi-scale features at different input resolutions. We find that there are different matching relationships between the object distribution and the detection head at different input resolutions. Based on the instructive findings, we propose a lightweight traffic object detection network based on matching between detection head and object distribution, termed as MHD-Net. It consists of three main parts. The first is the detection head and object distribution matching strategy, which guides the rational configuration of detection head, so as to leverage multi-scale features to effectively detect objects at vastly different scales. The second is the cross-scale detection head configuration guideline, which instructs to replace multiple detection heads with only two detection heads possessing of rich feature representations to achieve an excellent balance between detection accuracy, model parameters, FLOPs and detection speed. The third is the receptive field enlargement method, which combines the dilated convolution module with shallow features of backbone to further improve the detection accuracy at the cost of increasing model parameters very slightly. The proposed model achieves more competitive performance than other models on BDD100K dataset and our proposed ETFOD-v2 dataset. The code will be available. △ Less

Submitted 7 October, 2022; originally announced October 2022.

Comments: 26 pages, 4 figures, 7 tables

arXiv:2209.12764 [pdf, other]

Graph Neural Network and Superpixel Based Brain Tissue Segmentation (Corrected Version)

Authors: Chong Wu, Zhenan Feng, Houwang Zhang, Hong Yan

Abstract: Convolutional neural networks (CNNs) are usually used as a backbone to design methods in biomedical image segmentation. However, the limitation of receptive field and large number of parameters limit the performance of these methods. In this paper, we propose a graph neural network (GNN) based method named GNN-SEG for the segmentation of brain tissues. Different to conventional CNN based methods,… ▽ More Convolutional neural networks (CNNs) are usually used as a backbone to design methods in biomedical image segmentation. However, the limitation of receptive field and large number of parameters limit the performance of these methods. In this paper, we propose a graph neural network (GNN) based method named GNN-SEG for the segmentation of brain tissues. Different to conventional CNN based methods, GNN-SEG takes superpixels as basic processing units and uses GNNs to learn the structure of brain tissues. Besides, inspired by the interaction mechanism in biological vision systems, we propose two kinds of interaction modules for feature enhancement and integration. In the experiments, we compared GNN-SEG with state-of-the-art CNN based methods on four datasets of brain magnetic resonance images. The experimental results show the superiority of GNN-SEG. △ Less

Submitted 21 September, 2022; originally announced September 2022.

Comments: The original version of this paper was accepted and presented at 2022 International Joint Conference on Neural Networks (IJCNN). This version corrects the mistakes in Figs. 7 and 8

arXiv:2208.08855 [pdf, other]

Adaptive Partially-Observed Sequential Change Detection and Isolation

Authors: Xinyu Zhao, Jiuyun Hu, Yajun Mei, Hao Yan

Abstract: High-dimensional data has become popular due to the easy accessibility of sensors in modern industrial applications. However, one specific challenge is that it is often not easy to obtain complete measurements due to limited sensing powers and resource constraints. Furthermore, distinct failure patterns may exist in the systems, and it is necessary to identify the true failure pattern. This work f… ▽ More High-dimensional data has become popular due to the easy accessibility of sensors in modern industrial applications. However, one specific challenge is that it is often not easy to obtain complete measurements due to limited sensing powers and resource constraints. Furthermore, distinct failure patterns may exist in the systems, and it is necessary to identify the true failure pattern. This work focuses on the online adaptive monitoring of high-dimensional data in resource-constrained environments with multiple potential failure modes. To achieve this, we propose to apply the Shiryaev-Roberts procedure on the failure mode level and utilize the multi-arm bandit to balance the exploration and exploitation. We further discuss the theoretical property of the proposed algorithm to show that the proposed method can correctly isolate the failure mode. Finally, extensive simulations and two case studies demonstrate that the change point detection performance and the failure mode isolation accuracy can be greatly improved. △ Less

Submitted 25 August, 2022; v1 submitted 9 August, 2022; originally announced August 2022.

Comments: Accepted in Technometrics

arXiv:2205.03599 [pdf, other]

GAN-Based Multi-View Video Coding with Spatio-Temporal EPI Reconstruction

Authors: Chengdong Lan, Hao Yan, Cheng Luo, Tiesong Zhao

Abstract: The introduction of multiple viewpoints in video scenes inevitably increases the bitrates required for storage and transmission. To reduce bitrates, researchers have developed methods to skip intermediate viewpoints during compression and delivery, and ultimately reconstruct them using Side Information (SI). Typically, depth maps are used to construct SI. However, their methods suffer from inaccur… ▽ More The introduction of multiple viewpoints in video scenes inevitably increases the bitrates required for storage and transmission. To reduce bitrates, researchers have developed methods to skip intermediate viewpoints during compression and delivery, and ultimately reconstruct them using Side Information (SI). Typically, depth maps are used to construct SI. However, their methods suffer from inaccuracies in reconstruction and inherently high bitrates. In this paper, we propose a novel multi-view video coding method that leverages the image generation capabilities of Generative Adversarial Network (GAN) to improve the reconstruction accuracy of SI. Additionally, we consider incorporating information from adjacent temporal and spatial viewpoints to further reduce SI redundancy. At the encoder, we construct a spatio-temporal Epipolar Plane Image (EPI) and further utilize a convolutional network to extract the latent code of a GAN as SI. At the decoder side, we combine the SI and adjacent viewpoints to reconstruct intermediate views using the GAN generator. Specifically, we establish a joint encoder constraint for reconstruction cost and SI entropy to achieve an optimal trade-off between reconstruction quality and bitrates overhead. Experiments demonstrate significantly improved Rate-Distortion (RD) performance compared with state-of-the-art methods. △ Less

Submitted 5 May, 2023; v1 submitted 7 May, 2022; originally announced May 2022.

arXiv:2201.04397 [pdf, other]

Towards Adversarially Robust Deep Image Denoising

Authors: Hanshu Yan, Jingfeng Zhang, Jiashi Feng, Masashi Sugiyama, Vincent Y. F. Tan

Abstract: This work systematically investigates the adversarial robustness of deep image denoisers (DIDs), i.e, how well DIDs can recover the ground truth from noisy observations degraded by adversarial perturbations. Firstly, to evaluate DIDs' robustness, we propose a novel adversarial attack, namely Observation-based Zero-mean Attack ({\sc ObsAtk}), to craft adversarial zero-mean perturbations on given no… ▽ More This work systematically investigates the adversarial robustness of deep image denoisers (DIDs), i.e, how well DIDs can recover the ground truth from noisy observations degraded by adversarial perturbations. Firstly, to evaluate DIDs' robustness, we propose a novel adversarial attack, namely Observation-based Zero-mean Attack ({\sc ObsAtk}), to craft adversarial zero-mean perturbations on given noisy images. We find that existing DIDs are vulnerable to the adversarial noise generated by {\sc ObsAtk}. Secondly, to robustify DIDs, we propose an adversarial training strategy, hybrid adversarial training ({\sc HAT}), that jointly trains DIDs with adversarial and non-adversarial noisy data to ensure that the reconstruction quality is high and the denoisers around non-adversarial data are locally smooth. The resultant DIDs can effectively remove various types of synthetic and adversarial noise. We also uncover that the robustness of DIDs benefits their generalization capability on unseen real-world noise. Indeed, {\sc HAT}-trained DIDs can recover high-quality clean images from real-world noise even without training on real noisy data. Extensive experiments on benchmark datasets, including Set68, PolyU, and SIDD, corroborate the effectiveness of {\sc ObsAtk} and {\sc HAT}. △ Less

Submitted 13 January, 2022; v1 submitted 12 January, 2022; originally announced January 2022.

Showing 1–50 of 87 results for author: Yan, H