-
Dynamic Frequency Feature Fusion Network for Multi-Source Remote Sensing Data Classification
Authors:
Yikang Zhao,
Feng Gao,
Xuepeng Jin,
Junyu Dong,
Qian Du
Abstract:
Multi-source data classification is a critical yet challenging task for remote sensing image interpretation. Existing methods lack adaptability to diverse land cover types when modeling frequency domain features. To this end, we propose a Dynamic Frequency Feature Fusion Network (DFFNet) for hyperspectral image (HSI) and Synthetic Aperture Radar (SAR) / Light Detection and Ranging (LiDAR) data joi…
▽ More
Multi-source data classification is a critical yet challenging task for remote sensing image interpretation. Existing methods lack adaptability to diverse land cover types when modeling frequency domain features. To this end, we propose a Dynamic Frequency Feature Fusion Network (DFFNet) for hyperspectral image (HSI) and Synthetic Aperture Radar (SAR) / Light Detection and Ranging (LiDAR) data joint classification. Specifically, we design a dynamic filter block to dynamically learn the filter kernels in the frequency domain by aggregating the input features. The frequency contextual knowledge is injected into frequency filter kernels. Additionally, we propose spectral-spatial adaptive fusion block for cross-modal feature fusion. It enhances the spectral and spatial attention weight interactions via channel shuffle operation, thereby providing comprehensive cross-modal feature fusion. Experiments on two benchmark datasets show that our DFFNet outperforms state-of-the-art methods in multi-source data classification. The codes will be made publicly available at https://github.com/oucailab/DFFNet.
△ Less
Submitted 6 July, 2025;
originally announced July 2025.
-
Hierarchical Testing with Rabbit Optimization for Industrial Cyber-Physical Systems
Authors:
Jinwei Hu,
Zezhi Tang,
Xin Jin,
Benyuan Zhang,
Yi Dong,
Xiaowei Huang
Abstract:
This paper presents HERO (Hierarchical Testing with Rabbit Optimization), a novel black-box adversarial testing framework for evaluating the robustness of deep learning-based Prognostics and Health Management systems in Industrial Cyber-Physical Systems. Leveraging Artificial Rabbit Optimization, HERO generates physically constrained adversarial examples that align with real-world data distributio…
▽ More
This paper presents HERO (Hierarchical Testing with Rabbit Optimization), a novel black-box adversarial testing framework for evaluating the robustness of deep learning-based Prognostics and Health Management systems in Industrial Cyber-Physical Systems. Leveraging Artificial Rabbit Optimization, HERO generates physically constrained adversarial examples that align with real-world data distributions via global and local perspective. Its generalizability ensures applicability across diverse ICPS scenarios. This study specifically focuses on the Proton Exchange Membrane Fuel Cell system, chosen for its highly dynamic operational conditions, complex degradation mechanisms, and increasing integration into ICPS as a sustainable and efficient energy solution. Experimental results highlight HERO's ability to uncover vulnerabilities in even state-of-the-art PHM models, underscoring the critical need for enhanced robustness in real-world applications. By addressing these challenges, HERO demonstrates its potential to advance more resilient PHM systems across a wide range of ICPS domains.
△ Less
Submitted 5 July, 2025;
originally announced July 2025.
-
Double Low-Rank 4D Tensor Decomposition for Circular RIS-Aided mmWave MIMO-NOMA System Channel Estimation in Mobility Scenarios
Authors:
Wanyuan Cai,
Xiaoping Jin,
Youming Li,
Menglei Sheng,
Mingjun Huang,
Qinke Qi,
Qiang Guo
Abstract:
Channel estimation is not only essential to highly reliable data transmission and massive device access but also an important component of the integrated sensing and communication (ISAC) in the sixth-generation (6G) mobile communication systems. In this paper, we consider a downlink channel estimation problem for circular reconfigurable intelligent surface (RIS)-aided millimeter-wave (mmWave) mult…
▽ More
Channel estimation is not only essential to highly reliable data transmission and massive device access but also an important component of the integrated sensing and communication (ISAC) in the sixth-generation (6G) mobile communication systems. In this paper, we consider a downlink channel estimation problem for circular reconfigurable intelligent surface (RIS)-aided millimeter-wave (mmWave) multiple-input multiple-output non-orthogonal multiple access (MIMO-NOMA) system in mobility scenarios. First, we propose a subframe partitioning scheme to facilitate the modeling of the received signal as a fourth-order tensor satisfying a canonical polyadic decomposition (CPD) form, thereby formulating the channel estimation problem as tensor decomposition and parameter extraction problems. Then, by exploiting both the global and local low-rank properties of the received signal, we propose a double low-rank 4D tensor decomposition model to decompose the received signal into four factor matrices, which is efficiently solved via alternating direction method of multipliers (ADMM). Subsequently, we propose a two-stage parameter estimation method based on the Jacobi-Anger expansion and the special structure of circular RIS to uniquely decouple the angle parameters. Furthermore, the time delay, Doppler shift, and channel gain parameters can also be estimated without ambiguities, and their estimation accuracy can be efficiently improved, especially at low signal-to-noise ratio (SNR). Finally, a concise closed-form expression for the Cramér-Rao bound (CRB) is derived as a performance benchmark. Numerical experiments are conducted to demonstrate the effectiveness of the proposed method compared with the other discussed methods.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
NAT: Neural Acoustic Transfer for Interactive Scenes in Real Time
Authors:
Xutong Jin,
Bo Pang,
Chenxi Xu,
Xinyun Hou,
Guoping Wang,
Sheng Li
Abstract:
Previous acoustic transfer methods rely on extensive precomputation and storage of data to enable real-time interaction and auditory feedback. However, these methods struggle with complex scenes, especially when dynamic changes in object position, material, and size significantly alter sound effects. These continuous variations lead to fluctuating acoustic transfer distributions, making it challen…
▽ More
Previous acoustic transfer methods rely on extensive precomputation and storage of data to enable real-time interaction and auditory feedback. However, these methods struggle with complex scenes, especially when dynamic changes in object position, material, and size significantly alter sound effects. These continuous variations lead to fluctuating acoustic transfer distributions, making it challenging to represent with basic data structures and render efficiently in real time. To address this challenge, we present Neural Acoustic Transfer, a novel approach that utilizes an implicit neural representation to encode precomputed acoustic transfer and its variations, allowing for real-time prediction of sound fields under varying conditions. To efficiently generate the training data required for the neural acoustic field, we developed a fast Monte-Carlo-based boundary element method (BEM) approximation for general scenarios with smooth Neumann conditions. Additionally, we implemented a GPU-accelerated version of standard BEM for scenarios requiring higher precision. These methods provide the necessary training data, enabling our neural network to accurately model the sound radiation space. We demonstrate our method's numerical accuracy and runtime efficiency (within several milliseconds for 30s audio) through comprehensive validation and comparisons in diverse acoustic transfer scenarios. Our approach allows for efficient and accurate modeling of sound behavior in dynamically changing environments, which can benefit a wide range of interactive applications such as virtual reality, augmented reality, and advanced audio production.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
NTIRE 2025 Challenge on Efficient Burst HDR and Restoration: Datasets, Methods, and Results
Authors:
Sangmin Lee,
Eunpil Park,
Angel Canelo,
Hyunhee Park,
Youngjo Kim,
Hyung-Ju Chun,
Xin Jin,
Chongyi Li,
Chun-Le Guo,
Radu Timofte,
Qi Wu,
Tianheng Qiu,
Yuchun Dong,
Shenglin Ding,
Guanghua Pan,
Weiyu Zhou,
Tao Hu,
Yixu Feng,
Duwei Dai,
Yu Cao,
Peng Wu,
Wei Dong,
Yanning Zhang,
Qingsen Yan,
Simon J. Larsen
, et al. (11 additional authors not shown)
Abstract:
This paper reviews the NTIRE 2025 Efficient Burst HDR and Restoration Challenge, which aims to advance efficient multi-frame high dynamic range (HDR) and restoration techniques. The challenge is based on a novel RAW multi-frame fusion dataset, comprising nine noisy and misaligned RAW frames with various exposure levels per scene. Participants were tasked with developing solutions capable of effect…
▽ More
This paper reviews the NTIRE 2025 Efficient Burst HDR and Restoration Challenge, which aims to advance efficient multi-frame high dynamic range (HDR) and restoration techniques. The challenge is based on a novel RAW multi-frame fusion dataset, comprising nine noisy and misaligned RAW frames with various exposure levels per scene. Participants were tasked with developing solutions capable of effectively fusing these frames while adhering to strict efficiency constraints: fewer than 30 million model parameters and a computational budget under 4.0 trillion FLOPs. A total of 217 participants registered, with six teams finally submitting valid solutions. The top-performing approach achieved a PSNR of 43.22 dB, showcasing the potential of novel methods in this domain. This paper provides a comprehensive overview of the challenge, compares the proposed solutions, and serves as a valuable reference for researchers and practitioners in efficient burst HDR and restoration.
△ Less
Submitted 17 May, 2025;
originally announced May 2025.
-
NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results
Authors:
Xin Li,
Yeying Jin,
Xin Jin,
Zongwei Wu,
Bingchen Li,
Yufei Wang,
Wenhan Yang,
Yu Li,
Zhibo Chen,
Bihan Wen,
Robby T. Tan,
Radu Timofte,
Qiyu Rong,
Hongyuan Jing,
Mengmeng Zhang,
Jinglong Li,
Xiangyu Lu,
Yi Ren,
Yuting Liu,
Meng Zhang,
Xiang Chen,
Qiyuan Guan,
Jiangxin Dong,
Jinshan Pan,
Conglin Gou
, et al. (112 additional authors not shown)
Abstract:
This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images. This challenge received a wide range of impressive solutions, which are developed and evaluated using our collected real-world Raindrop Clarity dataset. Unlike existing deraining datasets, our Raindrop Clarity dataset is more diverse and challenging in degradation types and contents, which includ…
▽ More
This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images. This challenge received a wide range of impressive solutions, which are developed and evaluated using our collected real-world Raindrop Clarity dataset. Unlike existing deraining datasets, our Raindrop Clarity dataset is more diverse and challenging in degradation types and contents, which includes day raindrop-focused, day background-focused, night raindrop-focused, and night background-focused degradations. This dataset is divided into three subsets for competition: 14,139 images for training, 240 images for validation, and 731 images for testing. The primary objective of this challenge is to establish a new and powerful benchmark for the task of removing raindrops under varying lighting and focus conditions. There are a total of 361 participants in the competition, and 32 teams submitting valid solutions and fact sheets for the final testing phase. These submissions achieved state-of-the-art (SOTA) performance on the Raindrop Clarity dataset. The project can be found at https://lixinustc.github.io/CVPR-NTIRE2025-RainDrop-Competition.github.io/.
△ Less
Submitted 19 April, 2025; v1 submitted 17 April, 2025;
originally announced April 2025.
-
High-Precision Overlay Registration via Spatial-Terminal Iterative Learning in Roll-to-Roll Manufacturing
Authors:
Zifeng Wang,
Xiaoning Jin
Abstract:
Roll-to-roll (R2R) printing technologies are promising for high-volume continuous production of substrate-based electronic products. One of the major challenges in R2R flexible electronics printing is achieving tight alignment tolerances, as specified by the device resolution (usually at the micro-meter level), for multi-layer printed electronics. The alignment of the printed patterns in different…
▽ More
Roll-to-roll (R2R) printing technologies are promising for high-volume continuous production of substrate-based electronic products. One of the major challenges in R2R flexible electronics printing is achieving tight alignment tolerances, as specified by the device resolution (usually at the micro-meter level), for multi-layer printed electronics. The alignment of the printed patterns in different layers is known as registration. Conventional registration control methods rely on real-time feedback controllers, such as PID control, to regulate the web tension and the web speed. However, those methods may lose effectiveness in compensating for recurring disturbances and supporting effective mitigation of registration errors. In this paper, we propose a Spatial-Terminal Iterative Learning Control (STILC) method integrated with PID control to iteratively learn and reduce registration error cycle-by-cycle, converging it to zero. This approach enables unprecedented precision in the creation, integration, and manipulation of multi-layer microstructures in R2R processes. We theoretically prove the convergence of the proposed STILC-PID hybrid approach and validate its effectiveness through a simulated registration error scenario caused by axis mismatch between roller and motor, a common issue in R2R systems. The results demonstrate that the STILC-PID hybrid control method can fully eliminate the registration error after a feasible number of iterations. Additionally, we analyze the impact of different learning gains on the convergence performance of STILC.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
Reinforcement Learning based Constrained Optimal Control: an Interpretable Reward Design
Authors:
Jingjie Ni,
Fangfei Li,
Xin Jin,
Xianlun Peng,
Yang Tang
Abstract:
This paper presents an interpretable reward design framework for reinforcement learning based constrained optimal control problems with state and terminal constraints. The problem is formalized within a standard partially observable Markov decision process framework. The reward function is constructed from four weighted components: a terminal constraint reward, a guidance reward, a penalty for sta…
▽ More
This paper presents an interpretable reward design framework for reinforcement learning based constrained optimal control problems with state and terminal constraints. The problem is formalized within a standard partially observable Markov decision process framework. The reward function is constructed from four weighted components: a terminal constraint reward, a guidance reward, a penalty for state constraint violations, and a cost reduction incentive reward. A theoretically justified reward design is then presented, which establishes bounds on the weights of the components. This approach ensures that constraints are satisfied and objectives are optimized while mitigating numerical instability. Acknowledging the importance of prior knowledge in reward design, we sequentially solve two subproblems, using each solution to inform the reward design for the subsequent problem. Subsequently, we integrate reinforcement learning with curriculum learning, utilizing policies derived from simpler subproblems to assist in tackling more complex challenges, thereby facilitating convergence. The framework is evaluated against original and randomly weighted reward designs in a multi-agent particle environment. Experimental results demonstrate that the proposed approach significantly enhances satisfaction of terminal and state constraints and optimization of control cost.
△ Less
Submitted 14 February, 2025;
originally announced February 2025.
-
TINQ: Temporal Inconsistency Guided Blind Video Quality Assessment
Authors:
Yixiao Li,
Xiaoyuan Yang,
Weide Liu,
Xin Jin,
Xu Jia,
Yukun Lai,
Haotao Liu,
Paul L Rosin,
Wei Zhou
Abstract:
Blind video quality assessment (BVQA) has been actively researched for user-generated content (UGC) videos. Recently, super-resolution (SR) techniques have been widely applied in UGC. Therefore, an effective BVQA method for both UGC and SR scenarios is essential. Temporal inconsistency, referring to irregularities between consecutive frames, is relevant to video quality. Current BVQA approaches ty…
▽ More
Blind video quality assessment (BVQA) has been actively researched for user-generated content (UGC) videos. Recently, super-resolution (SR) techniques have been widely applied in UGC. Therefore, an effective BVQA method for both UGC and SR scenarios is essential. Temporal inconsistency, referring to irregularities between consecutive frames, is relevant to video quality. Current BVQA approaches typically model temporal relationships in UGC videos using statistics of motion information, but inconsistencies remain unexplored. Additionally, different from temporal inconsistency in UGC videos, such inconsistency in SR videos is amplified due to upscaling algorithms. In this paper, we introduce the Temporal Inconsistency Guided Blind Video Quality Assessment (TINQ) metric, demonstrating that exploring temporal inconsistency is crucial for effective BVQA. Since temporal inconsistencies vary between UGC and SR videos, they are calculated in different ways. Based on this, a spatial module highlights inconsistent areas across consecutive frames at coarse and fine granularities. In addition, a temporal module aggregates features over time in two stages. The first stage employs a visual memory capacity block to adaptively segment the time dimension based on estimated complexity, while the second stage focuses on selecting key features. The stages work together through Consistency-aware Fusion Units to regress cross-time-scale video quality. Extensive experiments on UGC and SR video quality datasets show that our method outperforms existing state-of-the-art BVQA methods. Code is available at https://github.com/Lighting-YXLI/TINQ.
△ Less
Submitted 25 December, 2024;
originally announced December 2024.
-
Semantics Disentanglement and Composition for Versatile Codec toward both Human-eye Perception and Machine Vision Task
Authors:
Jinming Liu,
Yuntao Wei,
Junyan Lin,
Shengyang Zhao,
Heming Sun,
Zhibo Chen,
Wenjun Zeng,
Xin Jin
Abstract:
While learned image compression methods have achieved impressive results in either human visual perception or machine vision tasks, they are often specialized only for one domain. This drawback limits their versatility and generalizability across scenarios and also requires retraining to adapt to new applications-a process that adds significant complexity and cost in real-world scenarios. In this…
▽ More
While learned image compression methods have achieved impressive results in either human visual perception or machine vision tasks, they are often specialized only for one domain. This drawback limits their versatility and generalizability across scenarios and also requires retraining to adapt to new applications-a process that adds significant complexity and cost in real-world scenarios. In this study, we introduce an innovative semantics DISentanglement and COmposition VERsatile codec (DISCOVER) to simultaneously enhance human-eye perception and machine vision tasks. The approach derives a set of labels per task through multimodal large models, which grounding models are then applied for precise localization, enabling a comprehensive understanding and disentanglement of image components at the encoder side. At the decoding stage, a comprehensive reconstruction of the image is achieved by leveraging these encoded components alongside priors from generative models, thereby optimizing performance for both human visual perception and machine-based analytical tasks. Extensive experimental evaluations substantiate the robustness and effectiveness of DISCOVER, demonstrating superior performance in fulfilling the dual objectives of human and machine vision requirements.
△ Less
Submitted 23 December, 2024;
originally announced December 2024.
-
A Diffuse Light Field Imaging Model for Forward-Scattering Photon-Coded Signal Retrieval
Authors:
Hongkun Cao,
Xin Jin,
Junjie Wei,
Yihui Fan,
Dongyu Du
Abstract:
Scattering imaging is often hindered by extremely low signal-to-noise ratios (SNRs) due to the prevalence of scattering noise. Light field imaging has been shown to be effective in suppressing noise and collect more ballistic photons as signals. However, to overcome the SNR limit in super-strong scattering environments, even with light field framework, only rare ballistic signals are insufficient.…
▽ More
Scattering imaging is often hindered by extremely low signal-to-noise ratios (SNRs) due to the prevalence of scattering noise. Light field imaging has been shown to be effective in suppressing noise and collect more ballistic photons as signals. However, to overcome the SNR limit in super-strong scattering environments, even with light field framework, only rare ballistic signals are insufficient. Inspired by radiative transfer theory, we propose a diffuse light field imaging model (DLIM) that leverages light field imaging to retrieve forward-scattered photons as signals to overcome the challenges of low-SNR imaging caused by super-strong scattering environments. This model aims to recover the ballistic photon signal as a source term from forward-scattered photons based on diffusion equations. The DLIM consists of two main processes: radiance modeling and diffusion light-field approximation. Radiate modeling analyzes the radiance distribution in scattering light field images using a proposed three-plane parameterization, which solves a 4-D radiate kernel describing the impulse function of scattering light field. Then, the scattering light field images synthesize a diffuse source satisfying the diffusion equation governing forward scattering photons, solved under Neumann boundary conditions in imaging space. This is the first physically-aware scattering light field imaging model, extending the conventional light field imaging framework from free space into diffuse space. The extensive experiments confirm that the DLIM can reconstruct the target objects even when scattering light field images are reduced as random noise at extremely low SNRs.
△ Less
Submitted 9 November, 2024;
originally announced November 2024.
-
Self-supervised inter-intra period-aware ECG representation learning for detecting atrial fibrillation
Authors:
Xiangqian Zhu,
Mengnan Shi,
Xuexin Yu,
Chang Liu,
Xiaocong Lian,
Jintao Fei,
Jiangying Luo,
Xin Jin,
Ping Zhang,
Xiangyang Ji
Abstract:
Atrial fibrillation is a commonly encountered clinical arrhythmia associated with stroke and increased mortality. Since professional medical knowledge is required for annotation, exploiting a large corpus of ECGs to develop accurate supervised learning-based atrial fibrillation algorithms remains challenging. Self-supervised learning (SSL) is a promising recipe for generalized ECG representation l…
▽ More
Atrial fibrillation is a commonly encountered clinical arrhythmia associated with stroke and increased mortality. Since professional medical knowledge is required for annotation, exploiting a large corpus of ECGs to develop accurate supervised learning-based atrial fibrillation algorithms remains challenging. Self-supervised learning (SSL) is a promising recipe for generalized ECG representation learning, eliminating the dependence on expensive labeling. However, without well-designed incorporations of knowledge related to atrial fibrillation, existing SSL approaches typically suffer from unsatisfactory capture of robust ECG representations. In this paper, we propose an inter-intra period-aware ECG representation learning approach. Considering ECGs of atrial fibrillation patients exhibit the irregularity in RR intervals and the absence of P-waves, we develop specific pre-training tasks for interperiod and intraperiod representations, aiming to learn the single-period stable morphology representation while retaining crucial interperiod features. After further fine-tuning, our approach demonstrates remarkable AUC performances on the BTCH dataset, \textit{i.e.}, 0.953/0.996 for paroxysmal/persistent atrial fibrillation detection. On commonly used benchmarks of CinC2017 and CPSC2021, the generalization capability and effectiveness of our methodology are substantiated with competitive results.
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
DiffSound: Differentiable Modal Sound Rendering and Inverse Rendering for Diverse Inference Tasks
Authors:
Xutong Jin,
Chenxi Xu,
Ruohan Gao,
Jiajun Wu,
Guoping Wang,
Sheng Li
Abstract:
Accurately estimating and simulating the physical properties of objects from real-world sound recordings is of great practical importance in the fields of vision, graphics, and robotics. However, the progress in these directions has been limited -- prior differentiable rigid or soft body simulation techniques cannot be directly applied to modal sound synthesis due to the high sampling rate of audi…
▽ More
Accurately estimating and simulating the physical properties of objects from real-world sound recordings is of great practical importance in the fields of vision, graphics, and robotics. However, the progress in these directions has been limited -- prior differentiable rigid or soft body simulation techniques cannot be directly applied to modal sound synthesis due to the high sampling rate of audio, while previous audio synthesizers often do not fully model the accurate physical properties of the sounding objects. We propose DiffSound, a differentiable sound rendering framework for physics-based modal sound synthesis, which is based on an implicit shape representation, a new high-order finite element analysis module, and a differentiable audio synthesizer. Our framework can solve a wide range of inverse problems thanks to the differentiability of the entire pipeline, including physical parameter estimation, geometric shape reasoning, and impact position prediction. Experimental results demonstrate the effectiveness of our approach, highlighting its ability to accurately reproduce the target sound in a physics-based manner. DiffSound serves as a valuable tool for various sound synthesis and analysis applications.
△ Less
Submitted 20 September, 2024;
originally announced September 2024.
-
A corpus-based investigation of pitch contours of monosyllabic words in conversational Taiwan Mandarin
Authors:
Xiaoyun Jin,
Mirjam Ernestus,
R. Harald Baayen
Abstract:
In Mandarin, the tonal contours of monosyllabic words produced in isolation or in careful speech are characterized by four lexical tones: a high-level tone (T1), a rising tone (T2), a dipping tone (T3) and a falling tone (T4). However, in spontaneous speech, the actual tonal realization of monosyllabic words can deviate significantly from these canonical tones due to intra-syllabic co-articulation…
▽ More
In Mandarin, the tonal contours of monosyllabic words produced in isolation or in careful speech are characterized by four lexical tones: a high-level tone (T1), a rising tone (T2), a dipping tone (T3) and a falling tone (T4). However, in spontaneous speech, the actual tonal realization of monosyllabic words can deviate significantly from these canonical tones due to intra-syllabic co-articulation and inter-syllabic co-articulation with adjacent tones. In addition, Chuang et al. (2024) recently reported that the tonal contours of disyllabic Mandarin words with T2-T4 tone pattern are co-determined by their meanings. Following up on their research, we present a corpus-based investigation of how the pitch contours of monosyllabic words are realized in spontaneous conversational Mandarin, focusing on the effects of contextual predictors on the one hand, and the way in words' meanings co-determine pitch contours on the other hand. We analyze the F0 contours of 3824 tokens of 63 different word types in a spontaneous Taiwan Mandarin corpus, using the generalized additive (mixed) model to decompose a given observed pitch contour into a set of component pitch contours. We show that the tonal context substantially modify a word's canonical tone. Once the effect of tonal context is controlled for, T2 and T3 emerge as low flat tones, contrasting with T1 as a high tone, and with T4 as a high-to-mid falling tone. The neutral tone (T0), which in standard descriptions, is realized based on the preceding tone, emerges as a low tone in its own right, modified by the other predictors in the same way as the standard tones T1, T2, T3, and T4. We also show that word, and even more so, word sense, co-determine words' F0 contours. Analyses of variable importance using random forests further supported the substantial effect of tonal context and an effect of word sense.
△ Less
Submitted 19 October, 2024; v1 submitted 12 September, 2024;
originally announced September 2024.
-
MSFMamba: Multi-Scale Feature Fusion State Space Model for Multi-Source Remote Sensing Image Classification
Authors:
Feng Gao,
Xuepeng Jin,
Xiaowei Zhou,
Junyu Dong,
Qian Du
Abstract:
In the field of multi-source remote sensing image classification, remarkable progress has been made by using Convolutional Neural Network (CNN) and Transformer. Recently, Mamba-based methods built upon the State Space Model (SSM) have shown great potential for long-range dependency modeling with linear complexity, but they have rarely been explored for multi-source remote sensing image classificat…
▽ More
In the field of multi-source remote sensing image classification, remarkable progress has been made by using Convolutional Neural Network (CNN) and Transformer. Recently, Mamba-based methods built upon the State Space Model (SSM) have shown great potential for long-range dependency modeling with linear complexity, but they have rarely been explored for multi-source remote sensing image classification tasks. To address this issue, we propose the Multi-Scale Feature Fusion Mamba (MSFMamba) network, a novel framework designed for the joint classification of hyperspectral image (HSI) and Light Detection and Ranging (LiDAR)/Synthetic Aperture Radar (SAR) data. The MSFMamba network is composed of three key components: the Multi-Scale Spatial Mamba (MSpa-Mamba) block, the Spectral Mamba (Spe-Mamba) block, and the Fusion Mamba (Fus-Mamba) block. The MSpa-Mamba block employs a multi-scale strategy to reduce computational cost and alleviate feature redundancy in multiple scanning routes, ensuring efficient spatial feature modeling. The Spe-Mamba block focuses on spectral feature extraction, addressing the unique challenges of HSI data representation. Finally, the Fus-Mamba block bridges the heterogeneous gap between HSI and LiDAR/SAR data by extending the original Mamba architecture to accommodate dual inputs, enhancing cross-modal feature interactions and enabling seamless data fusion. Together, these components enable MSFMamba to effectively tackle the challenges of multi-source data classification, delivering improved performance with optimized computational efficiency. Comprehensive experiments on four real-world multi-source remote sensing datasets demonstrate the superiority of MSFMamba outperforms several state-of-the-art methods. The source codes of MSFMamba are publicly available at https://github.com/oucailab/MSFMamba.
△ Less
Submitted 26 January, 2025; v1 submitted 26 August, 2024;
originally announced August 2024.
-
Rate-Distortion-Cognition Controllable Versatile Neural Image Compression
Authors:
Jinming Liu,
Ruoyu Feng,
Yunpeng Qi,
Qiuyu Chen,
Zhibo Chen,
Wenjun Zeng,
Xin Jin
Abstract:
Recently, the field of Image Coding for Machines (ICM) has garnered heightened interest and significant advances thanks to the rapid progress of learning-based techniques for image compression and analysis. Previous studies often require training separate codecs to support various bitrate levels, machine tasks, and networks, thus lacking both flexibility and practicality. To address these challeng…
▽ More
Recently, the field of Image Coding for Machines (ICM) has garnered heightened interest and significant advances thanks to the rapid progress of learning-based techniques for image compression and analysis. Previous studies often require training separate codecs to support various bitrate levels, machine tasks, and networks, thus lacking both flexibility and practicality. To address these challenges, we propose a rate-distortion-cognition controllable versatile image compression, which method allows the users to adjust the bitrate (i.e., Rate), image reconstruction quality (i.e., Distortion), and machine task accuracy (i.e., Cognition) with a single neural model, achieving ultra-controllability. Specifically, we first introduce a cognition-oriented loss in the primary compression branch to train a codec for diverse machine tasks. This branch attains variable bitrate by regulating quantization degree through the latent code channels. To further enhance the quality of the reconstructed images, we employ an auxiliary branch to supplement residual information with a scalable bitstream. Ultimately, two branches use a `$βx + (1 - β) y$' interpolation strategy to achieve a balanced cognition-distortion trade-off. Extensive experiments demonstrate that our method yields satisfactory ICM performance and flexible Rate-Distortion-Cognition controlling.
△ Less
Submitted 17 July, 2024; v1 submitted 16 July, 2024;
originally announced July 2024.
-
Sparse Focus Network for Multi-Source Remote Sensing Data Classification
Authors:
Xuepeng Jin,
Junyan Lin,
Feng Gao,
Lin Qi,
Yang Zhou
Abstract:
Multi-source remote sensing data classification has emerged as a prominent research topic with the advancement of various sensors. Existing multi-source data classification methods are susceptible to irrelevant information interference during multi-source feature extraction and fusion. To solve this issue, we propose a sparse focus network for multi-source data classification. Sparse attention is…
▽ More
Multi-source remote sensing data classification has emerged as a prominent research topic with the advancement of various sensors. Existing multi-source data classification methods are susceptible to irrelevant information interference during multi-source feature extraction and fusion. To solve this issue, we propose a sparse focus network for multi-source data classification. Sparse attention is employed in Transformer block for HSI and SAR/LiDAR feature extraction, thereby the most useful self-attention values are maintained for better feature aggregation. Furthermore, cross-attention is used to enhance multi-source feature interactions, and further improves the efficiency of cross-modal feature fusion. Experimental results on the Berlin and Houston2018 datasets highlight the effectiveness of SF-Net, outperforming existing state-of-the-art methods.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Boosting Spatial-Spectral Masked Auto-Encoder Through Mining Redundant Spectra for HSI-SAR/LiDAR Classification
Authors:
Junyan Lin,
Xuepeng Jin,
Feng Gao,
Junyu Dong,
Hui Yu
Abstract:
Although recent masked image modeling (MIM)-based HSI-LiDAR/SAR classification methods have gradually recognized the importance of the spectral information, they have not adequately addressed the redundancy among different spectra, resulting in information leakage during the pretraining stage. This issue directly impairs the representation ability of the model. To tackle the problem, we propose a…
▽ More
Although recent masked image modeling (MIM)-based HSI-LiDAR/SAR classification methods have gradually recognized the importance of the spectral information, they have not adequately addressed the redundancy among different spectra, resulting in information leakage during the pretraining stage. This issue directly impairs the representation ability of the model. To tackle the problem, we propose a new strategy, named Mining Redundant Spectra (MRS). Unlike randomly masking spectral bands, MRS selectively masks them by similarity to increase the reconstruction difficulty. Specifically, a random spectral band is chosen during pretraining, and the selected and highly similar bands are masked. Experimental results demonstrate that employing the MRS strategy during the pretraining stage effectively improves the accuracy of existing MIM-based methods on the Berlin and Houston 2018 datasets.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
A Reconfigurable Subarray Architecture and Hybrid Beamforming for Millimeter-Wave Dual-Function-Radar-Communication Systems
Authors:
Xin Jin,
Tiejun Lv,
Wei Ni,
Zhipeng Lin,
Qiuming Zhu,
Ekram Hossain,
H. Vincent Poor
Abstract:
Dual-function-radar-communication (DFRC) is a promising candidate technology for next-generation networks. By integrating hybrid analog-digital (HAD) beamforming into a multi-user millimeter-wave (mmWave) DFRC system, we design a new reconfigurable subarray (RS) architecture and jointly optimize the HAD beamforming to maximize the communication sum-rate and ensure a prescribed signal-to-clutter-pl…
▽ More
Dual-function-radar-communication (DFRC) is a promising candidate technology for next-generation networks. By integrating hybrid analog-digital (HAD) beamforming into a multi-user millimeter-wave (mmWave) DFRC system, we design a new reconfigurable subarray (RS) architecture and jointly optimize the HAD beamforming to maximize the communication sum-rate and ensure a prescribed signal-to-clutter-plus-noise ratio for radar sensing. Considering the non-convexity of this problem arising from multiplicative coupling of the analog and digital beamforming, we convert the sum-rate maximization into an equivalent weighted mean-square error minimization and apply penalty dual decomposition to decouple the analog and digital beamforming. Specifically, a second-order cone program is first constructed to optimize the fully digital counterpart of the HAD beamforming. Then, the sparsity of the RS architecture is exploited to obtain a low-complexity solution for the HAD beamforming. The convergence and complexity analyses of our algorithm are carried out under the RS architecture. Simulations corroborate that, with the RS architecture, DFRC offers effective communication and sensing and improves energy efficiency by 83.4% and 114.2% with a moderate number of radio frequency chains and phase shifters, compared to the persistently- and fullyconnected architectures, respectively.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Integrated Sensing and Communication for Edge Inference with End-to-End Multi-View Fusion
Authors:
Xibin Jin,
Guoliang Li,
Shuai Wang,
Miaowen Wen,
Chengzhong Xu,
H. Vincent Poor
Abstract:
Integrated sensing and communication (ISAC) is a promising solution to accelerate edge inference via the dual use of wireless signals. However, this paradigm needs to minimize the inference error and latency under ISAC co-functionality interference, for which the existing ISAC or edge resource allocation algorithms become inefficient, as they ignore the inter-dependency between low-level ISAC desi…
▽ More
Integrated sensing and communication (ISAC) is a promising solution to accelerate edge inference via the dual use of wireless signals. However, this paradigm needs to minimize the inference error and latency under ISAC co-functionality interference, for which the existing ISAC or edge resource allocation algorithms become inefficient, as they ignore the inter-dependency between low-level ISAC designs and high-level inference services. This letter proposes an inference-oriented ISAC (IO-ISAC) scheme, which minimizes upper bounds on end-to-end inference error and latency using multi-objective optimization. The key to our approach is to derive a multi-view inference model that accounts for both the number of observations and the angles of observations, by integrating a half-voting fusion rule and an angle-aware sensing model. Simulation results show that the proposed IO-ISAC outperforms other benchmarks in terms of both accuracy and latency.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Enhancing Functional Safety in Automotive AMS Circuits through Unsupervised Machine Learning
Authors:
Ayush Arunachalam,
Ian Kintz,
Suvadeep Banerjee,
Arnab Raha,
Xiankun Jin,
Fei Su,
Viswanathan Pillai Prasanth,
Rubin A. Parekhji,
Suriyaprakash Natarajan,
Kanad Basu
Abstract:
Given the widespread use of safety-critical applications in the automotive field, it is crucial to ensure the Functional Safety (FuSa) of circuits and components within automotive systems. The Analog and Mixed-Signal (AMS) circuits prevalent in these systems are more vulnerable to faults induced by parametric perturbations, noise, environmental stress, and other factors, in comparison to their dig…
▽ More
Given the widespread use of safety-critical applications in the automotive field, it is crucial to ensure the Functional Safety (FuSa) of circuits and components within automotive systems. The Analog and Mixed-Signal (AMS) circuits prevalent in these systems are more vulnerable to faults induced by parametric perturbations, noise, environmental stress, and other factors, in comparison to their digital counterparts. However, their continuous signal characteristics present an opportunity for early anomaly detection, enabling the implementation of safety mechanisms to prevent system failure. To address this need, we propose a novel framework based on unsupervised machine learning for early anomaly detection in AMS circuits. The proposed approach involves injecting anomalies at various circuit locations and individual components to create a diverse and comprehensive anomaly dataset, followed by the extraction of features from the observed circuit signals. Subsequently, we employ clustering algorithms to facilitate anomaly detection. Finally, we propose a time series framework to enhance and expedite anomaly detection performance. Our approach encompasses a systematic analysis of anomaly abstraction at multiple levels pertaining to the automotive domain, from hardware- to block-level, where anomalies are injected to create diverse fault scenarios. By monitoring the system behavior under these anomalous conditions, we capture the propagation of anomalies and their effects at different abstraction levels, thereby potentially paving the way for the implementation of reliable safety mechanisms to ensure the FuSa of automotive SoCs. Our experimental findings indicate that our approach achieves 100% anomaly detection accuracy and significantly optimizes the associated latency by 5X, underscoring the effectiveness of our devised solution.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
A Stochastic Hybrid Approach to Decentralized Networked Control: Stochastic Network Delays and Poisson Pulsing Attacks
Authors:
Dandan Zhang,
Xin Jin,
Hongye Su
Abstract:
By designing the decentralized time-regularized (Zeno-free) event-triggered strategies for the state-feedback control law, this paper considers the stochastic stabilization of a class of networked control systems, where two sources of randomness exist in multiple decentralized networks that operate asynchronously and independently: the communication channels are constrained by the stochastic netwo…
▽ More
By designing the decentralized time-regularized (Zeno-free) event-triggered strategies for the state-feedback control law, this paper considers the stochastic stabilization of a class of networked control systems, where two sources of randomness exist in multiple decentralized networks that operate asynchronously and independently: the communication channels are constrained by the stochastic network delays and also by Poisson pulsing denial-of-service (Pp-DoS) attacks. The time delay in the network denotes the length from a transmission instant to the corresponding update instant, and is supposed to be a continuous random variable subject to certain continuous probability distribution; while the attacks' cardinal number is a discrete random variable supposed to be subject to Poisson distribution, so the inter-attack time, i.e., the time between two consecutive attack instants, is subject to exponential distribution. The considered system is modeled as a stochastic hybrid formalism, where the randomness enters through the jump map into the reset value (the inter-attack time directly related) of each triggered strategy. By only sampling/transmitting state measurements when needed and simultaneously by taking the specific medium access protocols into account, the designed event-triggered strategies are synthesized in a state-based and decentralized form, which are robust (tolerable well) to stochastic network delays, under different tradeoff-conditions between the minimum inter-event times, maximum allowable delays (i.e., potentially tolerable delays) and the frequencies of attacks. Using stochastic hybrid tools to combine attack-active parts with attack-over parts, the designed triggered strategies, if designed well according to the actual system needs, can tolerate (be resilient to) the Pp-DoS attacks and stochastic network delays without jeopardizing the stability and Zeno-freeness.
△ Less
Submitted 12 June, 2025; v1 submitted 26 January, 2024;
originally announced January 2024.
-
MusicAOG: an Energy-Based Model for Learning and Sampling a Hierarchical Representation of Symbolic Music
Authors:
Yikai Qian,
Tianle Wang,
Xinyi Tong,
Xin Jin,
Duo Xu,
Bo Zheng,
Tiezheng Ge,
Feng Yu,
Song-Chun Zhu
Abstract:
In addressing the challenge of interpretability and generalizability of artificial music intelligence, this paper introduces a novel symbolic representation that amalgamates both explicit and implicit musical information across diverse traditions and granularities. Utilizing a hierarchical and-or graph representation, the model employs nodes and edges to encapsulate a broad spectrum of musical ele…
▽ More
In addressing the challenge of interpretability and generalizability of artificial music intelligence, this paper introduces a novel symbolic representation that amalgamates both explicit and implicit musical information across diverse traditions and granularities. Utilizing a hierarchical and-or graph representation, the model employs nodes and edges to encapsulate a broad spectrum of musical elements, including structures, textures, rhythms, and harmonies. This hierarchical approach expands the representability across various scales of music. This representation serves as the foundation for an energy-based model, uniquely tailored to learn musical concepts through a flexible algorithm framework relying on the minimax entropy principle. Utilizing an adapted Metropolis-Hastings sampling technique, the model enables fine-grained control over music generation. A comprehensive empirical evaluation, contrasting this novel approach with existing methodologies, manifests considerable advancements in interpretability and controllability. This study marks a substantial contribution to the fields of music analysis, composition, and computational musicology.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
EDiffSR: An Efficient Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution
Authors:
Yi Xiao,
Qiangqiang Yuan,
Kui Jiang,
Jiang He,
Xianyu Jin,
Liangpei Zhang
Abstract:
Recently, convolutional networks have achieved remarkable development in remote sensing image Super-Resoltuion (SR) by minimizing the regression objectives, e.g., MSE loss. However, despite achieving impressive performance, these methods often suffer from poor visual quality with over-smooth issues. Generative adversarial networks have the potential to infer intricate details, but they are easy to…
▽ More
Recently, convolutional networks have achieved remarkable development in remote sensing image Super-Resoltuion (SR) by minimizing the regression objectives, e.g., MSE loss. However, despite achieving impressive performance, these methods often suffer from poor visual quality with over-smooth issues. Generative adversarial networks have the potential to infer intricate details, but they are easy to collapse, resulting in undesirable artifacts. To mitigate these issues, in this paper, we first introduce Diffusion Probabilistic Model (DPM) for efficient remote sensing image SR, dubbed EDiffSR. EDiffSR is easy to train and maintains the merits of DPM in generating perceptual-pleasant images. Specifically, different from previous works using heavy UNet for noise prediction, we develop an Efficient Activation Network (EANet) to achieve favorable noise prediction performance by simplified channel attention and simple gate operation, which dramatically reduces the computational budget. Moreover, to introduce more valuable prior knowledge into the proposed EDiffSR, a practical Conditional Prior Enhancement Module (CPEM) is developed to help extract an enriched condition. Unlike most DPM-based SR models that directly generate conditions by amplifying LR images, the proposed CPEM helps to retain more informative cues for accurate SR. Extensive experiments on four remote sensing datasets demonstrate that EDiffSR can restore visual-pleasant images on simulated and real-world remote sensing images, both quantitatively and qualitatively. The code of EDiffSR will be available at https://github.com/XY-boy/EDiffSR
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Parallel compressive super-resolution imaging with wide field-of-view based on physics enhanced network
Authors:
Xiao-Peng Jin,
An-Dong Xiong,
Wei Zhang,
Xiao-Qing Wang,
Fan Liu,
Chang-Heng Li,
Xu-Ri Yao,
Xue-Feng Liu,
Qing Zhao
Abstract:
Achieving both high-performance and wide field-of-view (FOV) super-resolution imaging has been attracting increasing attention in recent years. However, such goal suffers from long reconstruction time and huge storage space. Parallel compressive imaging (PCI) provides an efficient solution, but the super-resolution quality and imaging speed are strongly dependent on precise optical transfer functi…
▽ More
Achieving both high-performance and wide field-of-view (FOV) super-resolution imaging has been attracting increasing attention in recent years. However, such goal suffers from long reconstruction time and huge storage space. Parallel compressive imaging (PCI) provides an efficient solution, but the super-resolution quality and imaging speed are strongly dependent on precise optical transfer function (OTF), modulation masks and reconstruction algorithm. In this work, we propose a wide FOV parallel compressive super-resolution imaging approach based on physics enhanced network. By training the network with the prior OTF of an arbitrary 128x128-pixel region and fine-tuning the network with other OTFs within rest regions of FOV, we realize both mask optimization and super-resolution imaging with up to 1020x1500 wide FOV. Numerical simulations and practical experiments demonstrate the effectiveness and superiority of the proposed approach. We achieve high-quality reconstruction with 4x4 times super-resolution enhancement using only three designed masks to reach real-time imaging speed. The proposed approach promotes the technology of rapid imaging for super-resolution and wide FOV, ranging from infrared to Terahertz.
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
Make Explicit Calibration Implicit: Calibrate Denoiser Instead of the Noise Model
Authors:
Xin Jin,
Jia-Wen Xiao,
Ling-Hao Han,
Chunle Guo,
Xialei Liu,
Chongyi Li,
Ming-Ming Cheng
Abstract:
Explicit calibration-based methods have dominated RAW image denoising under extremely low-light environments. However, these methods are impeded by several critical limitations: a) the explicit calibration process is both labor- and time-intensive, b) challenge exists in transferring denoisers across different camera models, and c) the disparity between synthetic and real noise is exacerbated by d…
▽ More
Explicit calibration-based methods have dominated RAW image denoising under extremely low-light environments. However, these methods are impeded by several critical limitations: a) the explicit calibration process is both labor- and time-intensive, b) challenge exists in transferring denoisers across different camera models, and c) the disparity between synthetic and real noise is exacerbated by digital gain. To address these issues, we introduce a groundbreaking pipeline named Lighting Every Darkness (LED), which is effective regardless of the digital gain or the camera sensor. LED eliminates the need for explicit noise model calibration, instead utilizing an implicit fine-tuning process that allows quick deployment and requires minimal data. Structural modifications are also included to reduce the discrepancy between synthetic and real noise without extra computational demands. Our method surpasses existing methods in various camera models, including new ones not in public datasets, with just a few pairs per digital gain and only 0.5% of the typical iterations. Furthermore, LED also allows researchers to focus more on deep learning advancements while still utilizing sensor engineering benefits. Code and related materials can be found in https://srameo.github.io/projects/led-iccv23/ .
△ Less
Submitted 25 December, 2023; v1 submitted 7 August, 2023;
originally announced August 2023.
-
HODINet: High-Order Discrepant Interaction Network for RGB-D Salient Object Detection
Authors:
Kang Yi,
Jing Xu,
Xiao Jin,
Fu Guo,
Yan-Feng Wu
Abstract:
RGB-D salient object detection (SOD) aims to detect the prominent regions by jointly modeling RGB and depth information. Most RGB-D SOD methods apply the same type of backbones and fusion modules to identically learn the multimodality and multistage features. However, these features contribute differently to the final saliency results, which raises two issues: 1) how to model discrepant characteri…
▽ More
RGB-D salient object detection (SOD) aims to detect the prominent regions by jointly modeling RGB and depth information. Most RGB-D SOD methods apply the same type of backbones and fusion modules to identically learn the multimodality and multistage features. However, these features contribute differently to the final saliency results, which raises two issues: 1) how to model discrepant characteristics of RGB images and depth maps; 2) how to fuse these cross-modality features in different stages. In this paper, we propose a high-order discrepant interaction network (HODINet) for RGB-D SOD. Concretely, we first employ transformer-based and CNN-based architectures as backbones to encode RGB and depth features, respectively. Then, the high-order representations are delicately extracted and embedded into spatial and channel attentions for cross-modality feature fusion in different stages. Specifically, we design a high-order spatial fusion (HOSF) module and a high-order channel fusion (HOCF) module to fuse features of the first two and the last two stages, respectively. Besides, a cascaded pyramid reconstruction network is adopted to progressively decode the fused features in a top-down pathway. Extensive experiments are conducted on seven widely used datasets to demonstrate the effectiveness of the proposed approach. We achieve competitive performance against 24 state-of-the-art methods under four evaluation metrics.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Synchronization of multiple rigid body systems: a survey
Authors:
X. Jin,
Daniel W. C. Ho,
Y. Tang
Abstract:
The multi-agent system has been a hot topic in the past few decades owing to its lower cost, higher robustness, and higher flexibility. As a particular multi-agent system, the multiple rigid body system received a growing interest for its wide applications in transportation, aerospace, and ocean exploration. Due to the non-Euclidean configuration space of attitudes and the inherent nonlinearity of…
▽ More
The multi-agent system has been a hot topic in the past few decades owing to its lower cost, higher robustness, and higher flexibility. As a particular multi-agent system, the multiple rigid body system received a growing interest for its wide applications in transportation, aerospace, and ocean exploration. Due to the non-Euclidean configuration space of attitudes and the inherent nonlinearity of the dynamics of rigid body systems, synchronization of multiple rigid body systems is quite challenging. This paper aims to present an overview of the recent progress in synchronization of multiple rigid body systems from the view of two fundamental problems. The first problem focuses on attitude synchronization, while the second one focuses on cooperative motion control in that rotation and translation dynamics are coupled. Finally, a summary and future directions are given in the conclusion.
△ Less
Submitted 27 August, 2023; v1 submitted 4 June, 2023;
originally announced June 2023.
-
A quality assurance framework for real-time monitoring of deep learning segmentation models in radiotherapy
Authors:
Xiyao Jin,
Yao Hao,
Jessica Hilliard,
Zhehao Zhang,
Maria A. Thomas,
Hua Li,
Abhinav K. Jha,
Geoffrey D. Hugo
Abstract:
To safely deploy deep learning models in the clinic, a quality assurance framework is needed for routine or continuous monitoring of input-domain shift and the models' performance without ground truth contours. In this work, cardiac substructure segmentation was used as an example task to establish a QA framework. A benchmark dataset consisting of Computed Tomography (CT) images along with manual…
▽ More
To safely deploy deep learning models in the clinic, a quality assurance framework is needed for routine or continuous monitoring of input-domain shift and the models' performance without ground truth contours. In this work, cardiac substructure segmentation was used as an example task to establish a QA framework. A benchmark dataset consisting of Computed Tomography (CT) images along with manual cardiac delineations of 241 patients were collected, including one 'common' image domain and five 'uncommon' domains. Segmentation models were tested on the benchmark dataset for an initial evaluation of model capacity and limitations. An image domain shift detector was developed by utilizing a trained Denoising autoencoder (DAE) and two hand-engineered features. Another Variational Autoencoder (VAE) was also trained to estimate the shape quality of the auto-segmentation results. Using the extracted features from the image/segmentation pair as inputs, a regression model was trained to predict the per-patient segmentation accuracy, measured by Dice coefficient similarity (DSC). The framework was tested across 19 segmentation models to evaluate the generalizability of the entire framework.
As results, the predicted DSC of regression models achieved a mean absolute error (MAE) ranging from 0.036 to 0.046 with an averaged MAE of 0.041. When tested on the benchmark dataset, the performances of all segmentation models were not significantly affected by scanning parameters: FOV, slice thickness and reconstructions kernels. For input images with Poisson noise, CNN-based segmentation models demonstrated a decreased DSC ranging from 0.07 to 0.41, while the transformer-based model was not significantly affected.
△ Less
Submitted 19 May, 2023;
originally announced May 2023.
-
Semantically Structured Image Compression via Irregular Group-Based Decoupling
Authors:
Ruoyu Feng,
Yixin Gao,
Xin Jin,
Runsen Feng,
Zhibo Chen
Abstract:
Image compression techniques typically focus on compressing rectangular images for human consumption, however, resulting in transmitting redundant content for downstream applications. To overcome this limitation, some previous works propose to semantically structure the bitstream, which can meet specific application requirements by selective transmission and reconstruction. Nevertheless, they divi…
▽ More
Image compression techniques typically focus on compressing rectangular images for human consumption, however, resulting in transmitting redundant content for downstream applications. To overcome this limitation, some previous works propose to semantically structure the bitstream, which can meet specific application requirements by selective transmission and reconstruction. Nevertheless, they divide the input image into multiple rectangular regions according to semantics and ignore avoiding information interaction among them, causing waste of bitrate and distorted reconstruction of region boundaries. In this paper, we propose to decouple an image into multiple groups with irregular shapes based on a customized group mask and compress them independently. Our group mask describes the image at a finer granularity, enabling significant bitrate saving by reducing the transmission of redundant content. Moreover, to ensure the fidelity of selective reconstruction, this paper proposes the concept of group-independent transform that maintain the independence among distinct groups. And we instantiate it by the proposed Group-Independent Swin-Block (GI Swin-Block). Experimental results demonstrate that our framework structures the bitstream with negligible cost, and exhibits superior performance on both visual quality and intelligent task supporting.
△ Less
Submitted 2 March, 2025; v1 submitted 4 May, 2023;
originally announced May 2023.
-
An Order-Complexity Model for Aesthetic Quality Assessment of Homophony Music Performance
Authors:
Xin Jin,
Wu Zhou,
Jinyu Wang,
Duo Xu,
Yiqing Rong,
Jialin Sun
Abstract:
Although computational aesthetics evaluation has made certain achievements in many fields, its research of music performance remains to be explored. At present, subjective evaluation is still a ultimate method of music aesthetics research, but it will consume a lot of human and material resources. In addition, the music performance generated by AI is still mechanical, monotonous and lacking in bea…
▽ More
Although computational aesthetics evaluation has made certain achievements in many fields, its research of music performance remains to be explored. At present, subjective evaluation is still a ultimate method of music aesthetics research, but it will consume a lot of human and material resources. In addition, the music performance generated by AI is still mechanical, monotonous and lacking in beauty. In order to guide the generation task of AI music performance, and to improve the performance effect of human performers, this paper uses Birkhoff's aesthetic measure to propose a method of objective measurement of beauty. The main contributions of this paper are as follows: Firstly, we put forward an objective aesthetic evaluation method to measure the music performance aesthetic; Secondly, we propose 10 basic music features and 4 aesthetic music features. Experiments show that our method performs well on performance assessment.
△ Less
Submitted 22 April, 2023;
originally announced April 2023.
-
Learning Distortion Invariant Representation for Image Restoration from A Causality Perspective
Authors:
Xin Li,
Bingchen Li,
Xin Jin,
Cuiling Lan,
Zhibo Chen
Abstract:
In recent years, we have witnessed the great advancement of Deep neural networks (DNNs) in image restoration. However, a critical limitation is that they cannot generalize well to real-world degradations with different degrees or types. In this paper, we are the first to propose a novel training strategy for image restoration from the causality perspective, to improve the generalization ability of…
▽ More
In recent years, we have witnessed the great advancement of Deep neural networks (DNNs) in image restoration. However, a critical limitation is that they cannot generalize well to real-world degradations with different degrees or types. In this paper, we are the first to propose a novel training strategy for image restoration from the causality perspective, to improve the generalization ability of DNNs for unknown degradations. Our method, termed Distortion Invariant representation Learning (DIL), treats each distortion type and degree as one specific confounder, and learns the distortion-invariant representation by eliminating the harmful confounding effect of each degradation. We derive our DIL with the back-door criterion in causality by modeling the interventions of different distortions from the optimization perspective. Particularly, we introduce counterfactual distortion augmentation to simulate the virtual distortion types and degrees as the confounders. Then, we instantiate the intervention of each distortion with a virtual model updating based on corresponding distorted images, and eliminate them from the meta-learning perspective. Extensive experiments demonstrate the effectiveness of our DIL on the generalization capability for unseen distortion types and degrees. Our code will be available at https://github.com/lixinustc/Causal-IR-DIL.
△ Less
Submitted 31 March, 2023; v1 submitted 13 March, 2023;
originally announced March 2023.
-
QVRF: A Quantization-error-aware Variable Rate Framework for Learned Image Compression
Authors:
Kedeng Tong,
Yaojun Wu,
Yue Li,
Kai Zhang,
Li Zhang,
Xin Jin
Abstract:
Learned image compression has exhibited promising compression performance, but variable bitrates over a wide range remain a challenge. State-of-the-art variable rate methods compromise the loss of model performance and require numerous additional parameters. In this paper, we present a Quantization-error-aware Variable Rate Framework (QVRF) that utilizes a univariate quantization regulator a to ac…
▽ More
Learned image compression has exhibited promising compression performance, but variable bitrates over a wide range remain a challenge. State-of-the-art variable rate methods compromise the loss of model performance and require numerous additional parameters. In this paper, we present a Quantization-error-aware Variable Rate Framework (QVRF) that utilizes a univariate quantization regulator a to achieve wide-range variable rates within a single model. Specifically, QVRF defines a quantization regulator vector coupled with predefined Lagrange multipliers to control quantization error of all latent representation for discrete variable rates. Additionally, the reparameterization method makes QVRF compatible with a round quantizer. Exhaustive experiments demonstrate that existing fixed-rate VAE-based methods equipped with QVRF can achieve wide-range continuous variable rates within a single model without significant performance degradation. Furthermore, QVRF outperforms contemporary variable-rate methods in rate-distortion performance with minimal additional parameters.
△ Less
Submitted 10 March, 2023;
originally announced March 2023.
-
An Order-Complexity Model for Aesthetic Quality Assessment of Symbolic Homophony Music Scores
Authors:
Xin Jin,
Wu Zhou,
Jinyu Wang,
Duo Xu,
Yiqing Rong,
Shuai Cui
Abstract:
Computational aesthetics evaluation has made great achievements in the field of visual arts, but the research work on music still needs to be explored. Although the existing work of music generation is very substantial, the quality of music score generated by AI is relatively poor compared with that created by human composers. The music scores created by AI are usually monotonous and devoid of emo…
▽ More
Computational aesthetics evaluation has made great achievements in the field of visual arts, but the research work on music still needs to be explored. Although the existing work of music generation is very substantial, the quality of music score generated by AI is relatively poor compared with that created by human composers. The music scores created by AI are usually monotonous and devoid of emotion. Based on Birkhoff's aesthetic measure, this paper proposes an objective quantitative evaluation method for homophony music score aesthetic quality assessment. The main contributions of our work are as follows: first, we put forward a homophony music score aesthetic model to objectively evaluate the quality of music score as a baseline model; second, we put forward eight basic music features and four music aesthetic features.
△ Less
Submitted 14 January, 2023;
originally announced January 2023.
-
A perspective on Attitude Control Issues and Techniques
Authors:
Dandan Zhang,
Xin Jin,
Hongye Su
Abstract:
This paper reviews the attitude control problems for rigid-body systems, starting from the attitude representation for rigid body kinematics. Highly redundant rotation matrix defines the attitude orientation globally and uniquely by 9 parameters, which is the most fundamental one, without any singularities; minimum 3-parameter Euler angles or (modified) Rodrigues parameters define the attitude ori…
▽ More
This paper reviews the attitude control problems for rigid-body systems, starting from the attitude representation for rigid body kinematics. Highly redundant rotation matrix defines the attitude orientation globally and uniquely by 9 parameters, which is the most fundamental one, without any singularities; minimum 3-parameter Euler angles or (modified) Rodrigues parameters define the attitude orientation neither globally nor uniquely, but the former exhibits kinematical singularity and Gimbal lock, while the latter two exhibit geometrical singularity; once-redundant axis-angle or unit quaternion globally define the attitude rotation but not uniquely using 4 parameters, but the former is not appropriate to define very small or very large rotations, while the latter shows unwinding phenomenon despite of the reduced computation burden. In addition, we explore the relationships among those attitude representations, including the connections among Gimbal lock, unwinding phenomenon and a nowhere dense set of zero Lebesgue measure. Based on attitude representations, we analyze different attitude control laws, almost global control and global attitude control, nominal and general robustness, as well as the technique tools.
△ Less
Submitted 30 June, 2022;
originally announced June 2022.
-
Robust Event Triggering Control for Lateral Dynamics of Intelligent Vehicles with Designable Inter-event Times
Authors:
Xing Chu,
Zhi Liu,
Lei Mao,
Xin Jin,
Zhaoxia Peng,
Guoguang Wen
Abstract:
In this brief, an improved event-triggered update mechanism (ETM) for the linear quadratic regulator is proposed to solve the lateral motion control problem of intelligent vehicle under bounded disturbances. Based on a novel event function using a clock-like variable to determine the triggering time, we further introduce two new design parameters to improve control performance. Distinct from exist…
▽ More
In this brief, an improved event-triggered update mechanism (ETM) for the linear quadratic regulator is proposed to solve the lateral motion control problem of intelligent vehicle under bounded disturbances. Based on a novel event function using a clock-like variable to determine the triggering time, we further introduce two new design parameters to improve control performance. Distinct from existing event-based control mechanisms, the inter-event times (IETs) derived from the above control framework are designable, meaning that the proposed ETM can be deployed on practical vehicle more easily and effectively. In addition, the improved IETs-designable ETM features a global robust event-separation property that is extremely required for practical lateral motion control of vehicle subject to diverse disturbances. Theoretical analysis proves the feasibility and stability of the proposed control strategy for trajectory tracking under bounded disturbances. Finally, simulation results verify the theoretical results and show the advantages of the proposed control strategy.
△ Less
Submitted 14 March, 2022;
originally announced March 2022.
-
SADN: Learned Light Field Image Compression with Spatial-Angular Decorrelation
Authors:
Kedeng Tong,
Xin Jin,
Chen Wang,
Fan Jiang
Abstract:
Light field image becomes one of the most promising media types for immersive video applications. In this paper, we propose a novel end-to-end spatial-angular-decorrelated network (SADN) for high-efficiency light field image compression. Different from the existing methods that exploit either spatial or angular consistency in the light field image, SADN decouples the angular and spatial informatio…
▽ More
Light field image becomes one of the most promising media types for immersive video applications. In this paper, we propose a novel end-to-end spatial-angular-decorrelated network (SADN) for high-efficiency light field image compression. Different from the existing methods that exploit either spatial or angular consistency in the light field image, SADN decouples the angular and spatial information by dilation convolution and stride convolution in spatial-angular interaction, and performs feature fusion to compress spatial and angular information jointly. To train a stable and robust algorithm, a large-scale dataset consisting of 7549 light field images is proposed and built. The proposed method provides 2.137 times and 2.849 times higher compression efficiency relative to H.266/VVC and H.265/HEVC inter coding, respectively. It also outperforms the end-to-end image compression networks by an average of 79.6% bitrate saving with much higher subjective quality and light field consistency.
△ Less
Submitted 22 February, 2022;
originally announced February 2022.
-
Learning Cross-Scale Weighted Prediction for Efficient Neural Video Compression
Authors:
Zongyu Guo,
Runsen Feng,
Zhizheng Zhang,
Xin Jin,
Zhibo Chen
Abstract:
Neural video codecs have demonstrated great potential in video transmission and storage applications. Existing neural hybrid video coding approaches rely on optical flow or Gaussian-scale flow for prediction, which cannot support fine-grained adaptation to diverse motion content. Towards more content-adaptive prediction, we propose a novel cross-scale prediction module that achieves more effective…
▽ More
Neural video codecs have demonstrated great potential in video transmission and storage applications. Existing neural hybrid video coding approaches rely on optical flow or Gaussian-scale flow for prediction, which cannot support fine-grained adaptation to diverse motion content. Towards more content-adaptive prediction, we propose a novel cross-scale prediction module that achieves more effective motion compensation. Specifically, on the one hand, we produce a reference feature pyramid as prediction sources and then transmit cross-scale flows that leverage the feature scale to control the precision of prediction. On the other hand, for the first time, a weighted prediction mechanism is introduced even if only a single reference frame is available, which can help synthesize a fine prediction result by transmitting cross-scale weight maps. In addition to the cross-scale prediction module, we further propose a multi-stage quantization strategy, which improves the rate-distortion performance with no extra computational penalty during inference. We show the encouraging performance of our efficient neural video codec (ENVC) on several benchmark datasets. In particular, the proposed ENVC can compete with the latest coding standard H.266/VVC in terms of sRGB PSNR on UVG dataset for the low-latency mode. We also analyze in detail the effectiveness of the cross-scale prediction module in handling various video content, and provide a comprehensive ablation study to analyze those important components. Test code is available at https://github.com/USTC-IMCL/ENVC .
△ Less
Submitted 15 March, 2023; v1 submitted 25 December, 2021;
originally announced December 2021.
-
A Close Look at Few-shot Real Image Super-resolution from the Distortion Relation Perspective
Authors:
Xin Li,
Xin Jin,
Jun Fu,
Xiaoyuan Yu,
Bei Tong,
Zhibo Chen
Abstract:
Collecting amounts of distorted/clean image pairs in the real world is non-trivial, which seriously limits the practical applications of these supervised learning-based methods on real-world image super-resolution (RealSR). Previous works usually address this problem by leveraging unsupervised learning-based technologies to alleviate the dependency on paired training samples. However, these method…
▽ More
Collecting amounts of distorted/clean image pairs in the real world is non-trivial, which seriously limits the practical applications of these supervised learning-based methods on real-world image super-resolution (RealSR). Previous works usually address this problem by leveraging unsupervised learning-based technologies to alleviate the dependency on paired training samples. However, these methods typically suffer from unsatisfactory texture synthesis due to the lack of supervision of clean images. To overcome this problem, we are the first to have a close look at the under-explored direction for RealSR, i.e., few-shot real-world image super-resolution, which aims to tackle the challenging RealSR problem with few-shot distorted/clean image pairs. Under this brand-new scenario, we propose Distortion Relation guided Transfer Learning (DRTL) for the few-shot RealSR by transferring the rich restoration knowledge from auxiliary distortions (i.e., synthetic distortions) to the target RealSR under the guidance of distortion relation. Concretely, DRTL builds a knowledge graph to capture the distortion relation between auxiliary distortions and target distortion (i.e., real distortions in RealSR). Based on the distortion relation, DRTL adopts a gradient reweighting strategy to guide the knowledge transfer process between auxiliary distortions and target distortions. In this way, DRTL could quickly learn the most relevant knowledge from the synthetic distortions for the target distortion. We instantiate DRTL with two commonly-used transfer learning paradigms, including pre-training and meta-learning pipelines, to realize a distortion relation-aware Few-shot RealSR. Extensive experiments on multiple benchmarks and thorough ablation studies demonstrate the effectiveness of our DRTL.
△ Less
Submitted 18 April, 2023; v1 submitted 25 November, 2021;
originally announced November 2021.
-
Mid-wave infrared super-resolution imaging based on compressive calibration and sampling
Authors:
Xiao-Peng Jin,
Qing Zhao,
Xue-Feng Liu,
An-Dong Xiong
Abstract:
Mid-wave infrared (MWIR) cameras for large number pixels are extremely expensive compared with their counterparts in visible light, thus, super-resolution imaging (SRI) for MWIR by increasing imaging pixels has always been a research hotspot in recent years. Over the last decade, with the extensively investigation of the compressed sensing (CS) method, focal plane array (FPA) based compressive ima…
▽ More
Mid-wave infrared (MWIR) cameras for large number pixels are extremely expensive compared with their counterparts in visible light, thus, super-resolution imaging (SRI) for MWIR by increasing imaging pixels has always been a research hotspot in recent years. Over the last decade, with the extensively investigation of the compressed sensing (CS) method, focal plane array (FPA) based compressive imaging in MWIR developed rapidly for SRI. This paper presents a long-distance super-resolution FPA compressive imaging in MWIR with improved calibration method and imaging effect. By the use of CS, we measure and calculate the calibration matrix of optical system efficiently and precisely, which improves the imaging contrast and signal-to-noise ratio(SNR) compared with previous work. We also achieved the 4x4 times super-resolution reconstruction of the long-distance objects which reaches the limit of the system design in our experiment.
△ Less
Submitted 8 September, 2021;
originally announced September 2021.
-
RSKNet-MTSP: Effective and Portable Deep Architecture for Speaker Verification
Authors:
Yanfeng Wu,
Chenkai Guo,
Junan Zhao,
Xiao Jin,
Jing Xu
Abstract:
The convolutional neural network (CNN) based approaches have shown great success for speaker verification (SV) tasks, where modeling long temporal context and reducing information loss of speaker characteristics are two important challenges significantly affecting the verification performance. Previous works have introduced dilated convolution and multi-scale aggregation methods to address above c…
▽ More
The convolutional neural network (CNN) based approaches have shown great success for speaker verification (SV) tasks, where modeling long temporal context and reducing information loss of speaker characteristics are two important challenges significantly affecting the verification performance. Previous works have introduced dilated convolution and multi-scale aggregation methods to address above challenges. However, such methods are also hard to make full use of some valuable information, which make it difficult to substantially improve the verification performance. To address above issues, we construct a novel CNN-based architecture for SV, called RSKNet-MTSP, where a residual selective kernel block (RSKBlock) and a multiple time-scale statistics pooling (MTSP) module are first proposed. The RSKNet-MTSP can capture both long temporal context and neighbouring information, and gather more speaker-discriminative information from multi-scale features. In order to design a portable model for real applications with limited resources, we then present a lightweight version of RSKNet-MTSP, namely RSKNet-MTSP-L, which employs a combination technique associating the depthwise separable convolutions with low-rank factorization of weight matrices. Extensive experiments are conducted on two public SV datasets, VoxCeleb and Speaker in the Wild (SITW). The results demonstrate that 1) RSKNet-MTSP outperforms the state-of-the-art deep embedding architectures by at least 9%-26% in all test sets. 2) RSKNet-MTSP-L achieves competitive performance compared with baseline models with 17%-39% less network parameters. The ablation experiments further illustrate that our proposed approaches can achieve substantial improvement over prior methods.
△ Less
Submitted 30 August, 2021;
originally announced August 2021.
-
Integrated Decision and Control at Multi-Lane Intersections with Mixed Traffic Flow
Authors:
Jianhua Jiang,
Yangang Ren,
Yang Guan,
Shengbo Eben Li,
Yuming Yin,
Xiaoping Jin
Abstract:
Autonomous driving at intersections is one of the most complicated and accident-prone traffic scenarios, especially with mixed traffic participants such as vehicles, bicycles and pedestrians. The driving policy should make safe decisions to handle the dynamic traffic conditions and meet the requirements of on-board computation. However, most of the current researches focuses on simplified intersec…
▽ More
Autonomous driving at intersections is one of the most complicated and accident-prone traffic scenarios, especially with mixed traffic participants such as vehicles, bicycles and pedestrians. The driving policy should make safe decisions to handle the dynamic traffic conditions and meet the requirements of on-board computation. However, most of the current researches focuses on simplified intersections considering only the surrounding vehicles and idealized traffic lights. This paper improves the integrated decision and control framework and develops a learning-based algorithm to deal with complex intersections with mixed traffic flows, which can not only take account of realistic characteristics of traffic lights, but also learn a safe policy under different safety constraints. We first consider different velocity models for green and red lights in the training process and use a finite state machine to handle different modes of light transformation. Then we design different types of distance constraints for vehicles, traffic lights, pedestrians, bicycles respectively and formulize the constrained optimal control problems (OCPs) to be optimized. Finally, reinforcement learning (RL) with value and policy networks is adopted to solve the series of OCPs. In order to verify the safety and efficiency of the proposed method, we design a multi-lane intersection with the existence of large-scale mixed traffic participants and set practical traffic light phases. The simulation results indicate that the trained decision and control policy can well balance safety and tracking performance. Compared with model predictive control (MPC), the computational time is three orders of magnitude lower.
△ Less
Submitted 30 August, 2021;
originally announced August 2021.
-
NeuralSound: Learning-based Modal Sound Synthesis With Acoustic Transfer
Authors:
Xutong Jin,
Sheng Li,
Guoping Wang,
Dinesh Manocha
Abstract:
We present a novel learning-based modal sound synthesis approach that includes a mixed vibration solver for modal analysis and an end-to-end sound radiation network for acoustic transfer. Our mixed vibration solver consists of a 3D sparse convolution network and a Locally Optimal Block Preconditioned Conjugate Gradient module (LOBPCG) for iterative optimization. Moreover, we highlight the correlat…
▽ More
We present a novel learning-based modal sound synthesis approach that includes a mixed vibration solver for modal analysis and an end-to-end sound radiation network for acoustic transfer. Our mixed vibration solver consists of a 3D sparse convolution network and a Locally Optimal Block Preconditioned Conjugate Gradient module (LOBPCG) for iterative optimization. Moreover, we highlight the correlation between a standard modal vibration solver and our network architecture. Our radiation network predicts the Far-Field Acoustic Transfer maps (FFAT Maps) from the surface vibration of the object. The overall running time of our learning method for any new object is less than one second on a GTX 3080 Ti GPU while maintaining a high sound quality close to the ground truth that is computed using standard numerical methods. We also evaluate the numerical accuracy and perceptual accuracy of our sound synthesis approach on different objects corresponding to various materials.
△ Less
Submitted 28 May, 2022; v1 submitted 16 August, 2021;
originally announced August 2021.
-
Joint Secure Design of Downlink and D2D Cooperation Strategies for Multi-User Systems
Authors:
Seok-Hwan Park,
Xianglan Jin
Abstract:
This work studies the role of inter-user device-to-device (D2D) cooperation for improving physical-layer secret communication in multi-user downlink systems. It is assumed that there are out-of-band D2D channels, on each of which a selected legitimate user transmits an amplified version of the received downlink signal to other legitimate users. A key technical challenge for designing such systems…
▽ More
This work studies the role of inter-user device-to-device (D2D) cooperation for improving physical-layer secret communication in multi-user downlink systems. It is assumed that there are out-of-band D2D channels, on each of which a selected legitimate user transmits an amplified version of the received downlink signal to other legitimate users. A key technical challenge for designing such systems is that eavesdroppers can overhear downlink as well as D2D cooperation signals. We tackle the problem of jointly optimizing the downlink precoding, artificial noise covariance, and amplification coefficients that maximize the minimum rate. An iterative alternating optimization algorithm is proposed based on the matrix fractional programming. Numerical results confirm the performance gains of the proposed D2D cooperation scheme compared to benchmark secret communication schemes.
△ Less
Submitted 13 April, 2021;
originally announced April 2021.
-
Synthesizing MR Image Contrast Enhancement Using 3D High-resolution ConvNets
Authors:
Chao Chen,
Catalina Raymond,
Bill Speier,
Xinyu Jin,
Timothy F. Cloughesy,
Dieter Enzmann,
Benjamin M. Ellingson,
Corey W. Arnold
Abstract:
\textit{Objective:} Gadolinium-based contrast agents (GBCAs) have been widely used to better visualize disease in brain magnetic resonance imaging (MRI). However, gadolinium deposition within the brain and body has raised safety concerns about the use of GBCAs. Therefore, the development of novel approaches that can decrease or even eliminate GBCA exposure while providing similar contrast informat…
▽ More
\textit{Objective:} Gadolinium-based contrast agents (GBCAs) have been widely used to better visualize disease in brain magnetic resonance imaging (MRI). However, gadolinium deposition within the brain and body has raised safety concerns about the use of GBCAs. Therefore, the development of novel approaches that can decrease or even eliminate GBCA exposure while providing similar contrast information would be of significant use clinically. \textit{Methods:} In this work, we present a deep learning based approach for contrast-enhanced T1 synthesis on brain tumor patients. A 3D high-resolution fully convolutional network (FCN), which maintains high resolution information through processing and aggregates multi-scale information in parallel, is designed to map pre-contrast MRI sequences to contrast-enhanced MRI sequences. Specifically, three pre-contrast MRI sequences, T1, T2 and apparent diffusion coefficient map (ADC), are utilized as inputs and the post-contrast T1 sequences are utilized as target output. To alleviate the data imbalance problem between normal tissues and the tumor regions, we introduce a local loss to improve the contribution of the tumor regions, which leads to better enhancement results on tumors. \textit{Results:} Extensive quantitative and visual assessments are performed, with our proposed model achieving a PSNR of 28.24dB in the brain and 21.2dB in tumor regions. \textit{Conclusion and Significance:} Our results suggest the potential of substituting GBCAs with synthetic contrast images generated via deep learning. Code is available at \url{https://github.com/chenchao666/Contrast-enhanced-MRI-Synthesis
△ Less
Submitted 16 July, 2022; v1 submitted 4 April, 2021;
originally announced April 2021.
-
Plane Spiral OAM Mode-Group Based MIMO Communications: An Experimental Study
Authors:
Xiaowen Xiong,
Shilie Zheng,
Zelin Zhu,
Yuqi Chen,
Hongzhe Shi,
Bingchen Pan,
Cheng Ren,
Xianbin Yu,
Xiaofeng Jin,
Wei E. I. Sha,
Xianmin Zhang
Abstract:
Spatial division multiplexing using conventional orbital angular momentum (OAM) has become a well-known physical layer transmission method over the past decade. The mode-group (MG) superposed by specific single mode plane spiral OAM (PSOAM) waves has been proved to be a flexible beamforming method to achieve the azimuthal pattern diversity, which inherits the spiral phase distribution of conventio…
▽ More
Spatial division multiplexing using conventional orbital angular momentum (OAM) has become a well-known physical layer transmission method over the past decade. The mode-group (MG) superposed by specific single mode plane spiral OAM (PSOAM) waves has been proved to be a flexible beamforming method to achieve the azimuthal pattern diversity, which inherits the spiral phase distribution of conventional OAM wave. Thus, it possesses both the beam directionality and vorticity. In this paper, it's the first time to show and verify novel PSOAM MG based multiple-in-multiple-out (MIMO) communication link (MG-MIMO) experimentally in a line-of-sight (LoS) scenario. A compact multi-mode PSOAM antenna is demonstrated experimentally to generate multiple independent controllable PSOAM waves, which can be used for constructing MGs. After several proof-of-principle tests, it has been verified that the beam directionality gain of MG can improve the receiving signal-to-noise (SNR) level in an actual system, meanwhile, the vorticity can provide another degree of freedom (DoF) to reduce the spatial correlation of MIMO system. Furthermore, a tentative long-distance transmission experiment operated at 10.2 GHz has been performed successfully at a distance of 50 m with a single-way spectrum efficiency of 3.7 bits/s/Hz/stream. The proposed MG-MIMO may have potential in the long-distance LoS back-haul scenario.
△ Less
Submitted 11 March, 2021;
originally announced March 2021.
-
Learned Block-based Hybrid Image Compression
Authors:
Yaojun Wu,
Xin Li,
Zhizheng Zhang,
Xin Jin,
Zhibo Chen
Abstract:
Recent works on learned image compression perform encoding and decoding processes in a full-resolution manner, resulting in two problems when deployed for practical applications. First, parallel acceleration of the autoregressive entropy model cannot be achieved due to serial decoding. Second, full-resolution inference often causes the out-of-memory(OOM) problem with limited GPU resources, especia…
▽ More
Recent works on learned image compression perform encoding and decoding processes in a full-resolution manner, resulting in two problems when deployed for practical applications. First, parallel acceleration of the autoregressive entropy model cannot be achieved due to serial decoding. Second, full-resolution inference often causes the out-of-memory(OOM) problem with limited GPU resources, especially for high-resolution images. Block partition is a good design choice to handle the above issues, but it brings about new challenges in reducing the redundancy between blocks and eliminating block effects. To tackle the above challenges, this paper provides a learned block-based hybrid image compression (LBHIC) framework. Specifically, we introduce explicit intra prediction into a learned image compression framework to utilize the relation among adjacent blocks. Superior to context modeling by linear weighting of neighbor pixels in traditional codecs, we propose a contextual prediction module (CPM) to better capture long-range correlations by utilizing the strip pooling to extract the most relevant information in neighboring latent space, thus achieving effective information prediction. Moreover, to alleviate blocking artifacts, we further propose a boundary-aware postprocessing module (BPM) with the edge importance taken into account. Extensive experiments demonstrate that the proposed LBHIC codec outperforms the VVC, with a bit-rate conservation of 4.1%, and reduces the decoding time by approximately 86.7% compared with that of state-of-the-art learned image compression methods.
△ Less
Submitted 11 October, 2021; v1 submitted 17 December, 2020;
originally announced December 2020.
-
Learning Omni-frequency Region-adaptive Representations for Real Image Super-Resolution
Authors:
Xin Li,
Xin Jin,
Tao Yu,
Yingxue Pang,
Simeng Sun,
Zhizheng Zhang,
Zhibo Chen
Abstract:
Traditional single image super-resolution (SISR) methods that focus on solving single and uniform degradation (i.e., bicubic down-sampling), typically suffer from poor performance when applied into real-world low-resolution (LR) images due to the complicated realistic degradations. The key to solving this more challenging real image super-resolution (RealSR) problem lies in learning feature repres…
▽ More
Traditional single image super-resolution (SISR) methods that focus on solving single and uniform degradation (i.e., bicubic down-sampling), typically suffer from poor performance when applied into real-world low-resolution (LR) images due to the complicated realistic degradations. The key to solving this more challenging real image super-resolution (RealSR) problem lies in learning feature representations that are both informative and content-aware. In this paper, we propose an Omni-frequency Region-adaptive Network (ORNet) to address both challenges, here we call features of all low, middle and high frequencies omni-frequency features. Specifically, we start from the frequency perspective and design a Frequency Decomposition (FD) module to separate different frequency components to comprehensively compensate the information lost for real LR image. Then, considering the different regions of real LR image have different frequency information lost, we further design a Region-adaptive Frequency Aggregation (RFA) module by leveraging dynamic convolution and spatial attention to adaptively restore frequency components for different regions. The extensive experiments endorse the effective, and scenario-agnostic nature of our OR-Net for RealSR.
△ Less
Submitted 10 January, 2021; v1 submitted 11 December, 2020;
originally announced December 2020.
-
Grid-Interactive Multi-Zone Building Control Using Reinforcement Learning with Global-Local Policy Search
Authors:
Xiangyu Zhang,
Rohit Chintala,
Andrey Bernstein,
Peter Graf,
Xin Jin
Abstract:
In this paper, we develop a grid-interactive multi-zone building controller based on a deep reinforcement learning (RL) approach. The controller is designed to facilitate building operation during normal conditions and demand response events, while ensuring occupants comfort and energy efficiency. We leverage a continuous action space RL formulation, and devise a two-stage global-local RL training…
▽ More
In this paper, we develop a grid-interactive multi-zone building controller based on a deep reinforcement learning (RL) approach. The controller is designed to facilitate building operation during normal conditions and demand response events, while ensuring occupants comfort and energy efficiency. We leverage a continuous action space RL formulation, and devise a two-stage global-local RL training framework. In the first stage, a global fast policy search is performed using a gradient-free RL algorithm. In the second stage, a local fine-tuning is conducted using a policy gradient method. In contrast to the state-of-the-art model predictive control (MPC) approach, the proposed RL controller does not require complex computation during real-time operation and can adapt to non-linear building models. We illustrate the controller performance numerically using a five-zone commercial building.
△ Less
Submitted 13 October, 2020;
originally announced October 2020.
-
FAN: Frequency Aggregation Network for Real Image Super-resolution
Authors:
Yingxue Pang,
Xin Li,
Xin Jin,
Yaojun Wu,
Jianzhao Liu,
Sen Liu,
Zhibo Chen
Abstract:
Single image super-resolution (SISR) aims to recover the high-resolution (HR) image from its low-resolution (LR) input image. With the development of deep learning, SISR has achieved great progress. However, It is still a challenge to restore the real-world LR image with complicated authentic degradations. Therefore, we propose FAN, a frequency aggregation network, to address the real-world image…
▽ More
Single image super-resolution (SISR) aims to recover the high-resolution (HR) image from its low-resolution (LR) input image. With the development of deep learning, SISR has achieved great progress. However, It is still a challenge to restore the real-world LR image with complicated authentic degradations. Therefore, we propose FAN, a frequency aggregation network, to address the real-world image super-resolu-tion problem. Specifically, we extract different frequencies of the LR image and pass them to a channel attention-grouped residual dense network (CA-GRDB) individually to output corresponding feature maps. And then aggregating these residual dense feature maps adaptively to recover the HR image with enhanced details and textures. We conduct extensive experiments quantitatively and qualitatively to verify that our FAN performs well on the real image super-resolution task of AIM 2020 challenge. According to the released final results, our team SR-IM achieves the fourth place on the X4 track with PSNR of 31.1735 and SSIM of 0.8728.
△ Less
Submitted 30 September, 2020;
originally announced September 2020.