Search | arXiv e-print repository

TransDF: Time-Series Forecasting Needs Transformed Label Alignment

Authors: Hao Wang, Licheng Pan, Zhichao Chen, Xu Chen, Qingyang Dai, Lei Wang, Haoxuan Li, Zhouchen Lin

Abstract: Training time-series forecasting models presents unique challenges in designing effective learning objectives. Existing methods predominantly utilize the temporal mean squared error, which faces two critical challenges: (1) label autocorrelation, which leads to bias from the label sequence likelihood; (2) excessive amount of tasks, which increases with the forecast horizon and complicates optimiza… ▽ More Training time-series forecasting models presents unique challenges in designing effective learning objectives. Existing methods predominantly utilize the temporal mean squared error, which faces two critical challenges: (1) label autocorrelation, which leads to bias from the label sequence likelihood; (2) excessive amount of tasks, which increases with the forecast horizon and complicates optimization. To address these challenges, we propose Transform-enhanced Direct Forecast (TransDF), which transforms the label sequence into decorrelated components with discriminated significance. Models are trained to align the most significant components, thereby effectively mitigating label autocorrelation and reducing task amount. Extensive experiments demonstrate that TransDF achieves state-of-the-art performance and is compatible with various forecasting models. Code is available at https://anonymous.4open.science/r/TransDF-88CF. △ Less

Submitted 23 May, 2025; originally announced May 2025.

arXiv:2505.04203 [pdf, ps, other]

doi 10.1145/3721238.3730756

ELGAR: Expressive Cello Performance Motion Generation for Audio Rendition

Authors: Zhiping Qiu, Yitong Jin, Yuan Wang, Yi Shi, Chongwu Wang, Chao Tan, Xiaobing Li, Feng Yu, Tao Yu, Qionghai Dai

Abstract: The art of instrument performance stands as a vivid manifestation of human creativity and emotion. Nonetheless, generating instrument performance motions is a highly challenging task, as it requires not only capturing intricate movements but also reconstructing the complex dynamics of the performer-instrument interaction. While existing works primarily focus on modeling partial body motions, we pr… ▽ More The art of instrument performance stands as a vivid manifestation of human creativity and emotion. Nonetheless, generating instrument performance motions is a highly challenging task, as it requires not only capturing intricate movements but also reconstructing the complex dynamics of the performer-instrument interaction. While existing works primarily focus on modeling partial body motions, we propose Expressive ceLlo performance motion Generation for Audio Rendition (ELGAR), a state-of-the-art diffusion-based framework for whole-body fine-grained instrument performance motion generation solely from audio. To emphasize the interactive nature of the instrument performance, we introduce Hand Interactive Contact Loss (HICL) and Bow Interactive Contact Loss (BICL), which effectively guarantee the authenticity of the interplay. Moreover, to better evaluate whether the generated motions align with the semantic context of the music audio, we design novel metrics specifically for string instrument performance motion generation, including finger-contact distance, bow-string distance, and bowing score. Extensive evaluations and ablation studies are conducted to validate the efficacy of the proposed methods. In addition, we put forward a motion generation dataset SPD-GEN, collated and normalized from the MoCap dataset SPD. As demonstrated, ELGAR has shown great potential in generating instrument performance motions with complicated and fast interactions, which will promote further development in areas such as animation, music education, interactive art creation, etc. △ Less

Submitted 1 July, 2025; v1 submitted 7 May, 2025; originally announced May 2025.

Journal ref: SIGGRAPH 2025

arXiv:2504.19091 [pdf, other]

A Tutorial on MIMO-OFDM ISAC: From Far-Field to Near-Field

Authors: Qianglong Dai, Yong Zeng, Huizhi Wang, Changsheng You, Chao Zhou, Hongqiang Cheng, Xiaoli Xu, Shi Jin, A. Lee Swindlehurst, Yonina C. Eldar, Robert Schober, Rui Zhang, Xiaohu You

Abstract: Integrated sensing and communication (ISAC) is one of the key usage scenarios for future sixth-generation (6G) mobile communication networks, where communication and sensing (C&S) services are simultaneously provided through shared wireless spectrum, signal processing modules, hardware, and network infrastructure. Such an integration is strengthened by the technology trends in 6G, such as denser n… ▽ More Integrated sensing and communication (ISAC) is one of the key usage scenarios for future sixth-generation (6G) mobile communication networks, where communication and sensing (C&S) services are simultaneously provided through shared wireless spectrum, signal processing modules, hardware, and network infrastructure. Such an integration is strengthened by the technology trends in 6G, such as denser network nodes, larger antenna arrays, wider bandwidths, higher frequency bands, and more efficient utilization of spectrum and hardware resources, which incentivize and empower enhanced sensing capabilities. As the dominant waveform used in contemporary communication systems, orthogonal frequency division multiplexing (OFDM) is still expected to be a very competitive technology for 6G, rendering it necessary to thoroughly investigate the potential and challenges of OFDM ISAC. Thus, this paper aims to provide a comprehensive tutorial overview of ISAC systems enabled by large-scale multi-input multi-output (MIMO) and OFDM technologies and to discuss their fundamental principles, advantages, and enabling signal processing methods. To this end, a unified MIMO-OFDM ISAC system model is first introduced, followed by four frameworks for estimating parameters across the spatial, delay, and Doppler domains, including parallel one-domain, sequential one-domain, joint two-domain, and joint three-domain parameter estimation. Next, sensing algorithms and performance analyses are presented in detail for far-field scenarios where uniform plane wave (UPW) propagation is valid, followed by their extensions to near-field scenarios where uniform spherical wave (USW) characteristics need to be considered. Finally, this paper points out open challenges and outlines promising avenues for future research on MIMO-OFDM ISAC. △ Less

Submitted 26 April, 2025; originally announced April 2025.

arXiv:2504.17816 [pdf, other]

Subject-driven Video Generation via Disentangled Identity and Motion

Authors: Daneul Kim, Jingxu Zhang, Wonjoon Jin, Sunghyun Cho, Qi Dai, Jaesik Park, Chong Luo

Abstract: We propose to train a subject-driven customized video generation model through decoupling the subject-specific learning from temporal dynamics in zero-shot without additional tuning. A traditional method for video customization that is tuning-free often relies on large, annotated video datasets, which are computationally expensive and require extensive annotation. In contrast to the previous appro… ▽ More We propose to train a subject-driven customized video generation model through decoupling the subject-specific learning from temporal dynamics in zero-shot without additional tuning. A traditional method for video customization that is tuning-free often relies on large, annotated video datasets, which are computationally expensive and require extensive annotation. In contrast to the previous approach, we introduce the use of an image customization dataset directly on training video customization models, factorizing the video customization into two folds: (1) identity injection through image customization dataset and (2) temporal modeling preservation with a small set of unannotated videos through the image-to-video training method. Additionally, we employ random image token dropping with randomized image initialization during image-to-video fine-tuning to mitigate the copy-and-paste issue. To further enhance learning, we introduce stochastic switching during joint optimization of subject-specific and temporal features, mitigating catastrophic forgetting. Our method achieves strong subject consistency and scalability, outperforming existing video customization models in zero-shot settings, demonstrating the effectiveness of our framework. △ Less

Submitted 23 April, 2025; originally announced April 2025.

Comments: Project Page : https://carpedkm.github.io/projects/disentangled_sub/index.html

arXiv:2502.21036 [pdf, other]

A Demo of Radar Sensing Aided Rotatable Antenna for Wireless Communication System

Authors: Qi Dai, Beixiong Zheng, Qiyao Wang, Xue Xiong, Xiaodan Shao, Lipeng Zhu, Rui Zhang

Abstract: Rotatable antenna (RA) represents a novel antenna architecture that enhances wireless communication system performance by independently or collectively adjusting each antenna's boresight/orientation. In this demonstration, we develop a prototype of radar sensing-aided rotatable antenna that integrates radar sensing with dynamic antenna orientation to enhance wireless communication performance whil… ▽ More Rotatable antenna (RA) represents a novel antenna architecture that enhances wireless communication system performance by independently or collectively adjusting each antenna's boresight/orientation. In this demonstration, we develop a prototype of radar sensing-aided rotatable antenna that integrates radar sensing with dynamic antenna orientation to enhance wireless communication performance while maintaining low hardware costs. The proposed prototype consists of a transmitter (TX) module and a receiver (RX) module, both of which employ universal software radio peripherals (USRPs) for transmitting and receiving signals. Specifically, the TX utilizes a laser radar to detect the RX's location and conveys the angle of arrival (AoA) information to its antenna servo, which enables the RA to align its boresight direction with the identified RX. Experimental results examine the effectiveness of the proposed prototype and indicate that the RA significantly outperforms the traditional fixed-antenna system in terms of increasing received signal-to-noise ratio (SNR). △ Less

Submitted 17 April, 2025; v1 submitted 28 February, 2025; originally announced February 2025.

arXiv:2501.15206 [pdf, other]

doi 10.1103/PhysRevApplied.23.054073

Engineering-Oriented Design of Drift-Resilient MTJ Random Number Generator via Hybrid Control Strategies

Authors: Ran Zhang, Caihua Wan, Yingqian Xu, Xiaohan Li, Raik Hoffmann, Meike Hindenberg, Shiqiang Liu, Dehao Kong, Shilong Xiong, Shikun He, Alptekin Vardar, Qiang Dai, Junlu Gong, Yihui Sun, Zejie Zheng, Thomas Kämpfe, Guoqiang Yu, Xiufeng Han

Abstract: Magnetic Tunnel Junctions (MTJs) have shown great promise as hardware sources for true random number generation (TRNG) due to their intrinsic stochastic switching behavior. However, practical deployment remains challenged by drift in switching probability caused by thermal fluctuations, device aging, and environmental instability. This work presents an engineering-oriented, drift-resilient MTJ-bas… ▽ More Magnetic Tunnel Junctions (MTJs) have shown great promise as hardware sources for true random number generation (TRNG) due to their intrinsic stochastic switching behavior. However, practical deployment remains challenged by drift in switching probability caused by thermal fluctuations, device aging, and environmental instability. This work presents an engineering-oriented, drift-resilient MTJ-based TRNG architecture, enabled by a hybrid control strategy that combines self-stabilizing feedback with pulse width modulation. A key component is the Downcalibration-2 scheme, which updates the control parameter every two steps using only integer-resolution timing, ensuring excellent statistical quality without requiring bit discarding, pre-characterization, or external calibration. Extensive experimental measurements and numerical simulations demonstrate that this approach maintains stable randomness under dynamic temperature drift, using only simple digital logic. The proposed architecture offers high throughput, robustness, and scalability, making it well-suited for secure hardware applications, embedded systems, and edge computing environments. △ Less

Submitted 19 April, 2025; v1 submitted 25 January, 2025; originally announced January 2025.

Comments: 16 pages, 9 figures, data shared at https://doi.org/10.6084/m9.figshare.28680899.v1

arXiv:2501.03689 [pdf, other]

MAJL: A Model-Agnostic Joint Learning Framework for Music Source Separation and Pitch Estimation

Authors: Haojie Wei, Jun Yuan, Rui Zhang, Quanyu Dai, Yueguo Chen

Abstract: Music source separation and pitch estimation are two vital tasks in music information retrieval. Typically, the input of pitch estimation is obtained from the output of music source separation. Therefore, existing methods have tried to perform these two tasks simultaneously, so as to leverage the mutually beneficial relationship between both tasks. However, these methods still face two critical ch… ▽ More Music source separation and pitch estimation are two vital tasks in music information retrieval. Typically, the input of pitch estimation is obtained from the output of music source separation. Therefore, existing methods have tried to perform these two tasks simultaneously, so as to leverage the mutually beneficial relationship between both tasks. However, these methods still face two critical challenges that limit the improvement of both tasks: the lack of labeled data and joint learning optimization. To address these challenges, we propose a Model-Agnostic Joint Learning (MAJL) framework for both tasks. MAJL is a generic framework and can use variant models for each task. It includes a two-stage training method and a dynamic weighting method named Dynamic Weights on Hard Samples (DWHS), which addresses the lack of labeled data and joint learning optimization, respectively. Experimental results on public music datasets show that MAJL outperforms state-of-the-art methods on both tasks, with significant improvements of 0.92 in Signal-to-Distortion Ratio (SDR) for music source separation and 2.71% in Raw Pitch Accuracy (RPA) for pitch estimation. Furthermore, comprehensive studies not only validate the effectiveness of each component of MAJL, but also indicate the great generality of MAJL in adapting to different model architectures. △ Less

Submitted 7 January, 2025; originally announced January 2025.

arXiv:2412.20083 [pdf, other]

Achieving Full-Bandwidth Sensing Performance with Partial Bandwidth Allocation for ISAC

Authors: Zhiqiang Xiao, Zhiwen Zhou, Qianglong Dai, Yong Zeng, Fei Yang, Yan Chen

Abstract: This letter studies an uplink integrated sensing and communication (ISAC) system using discrete Fourier transform spread orthogonal frequency division multiplexing (DFT-s-OFDM) transmission. We try to answer the following fundamental question: With only a fractional bandwidth allocated to the user with sensing task, can the same delay resolution and unambiguous range be achieved as if all bandwidt… ▽ More This letter studies an uplink integrated sensing and communication (ISAC) system using discrete Fourier transform spread orthogonal frequency division multiplexing (DFT-s-OFDM) transmission. We try to answer the following fundamental question: With only a fractional bandwidth allocated to the user with sensing task, can the same delay resolution and unambiguous range be achieved as if all bandwidth were allocated to it? We affirmatively answer the question by proposing a novel two-stage delay estimation (TSDE) method that exploits the following facts: without increasing the allocated bandwidth, higher delay resolution can be achieved via distributed subcarrier allocation compared to its collocated counterpart, while there is a trade-off between delay resolution and unambiguous range by varying the decimation factor of subcarriers. Therefore, the key idea of the proposed TSDE method is to first perform coarse delay estimation with collocated subcarriers to achieve a large unambiguous range, and then use distributed subcarriers with optimized decimation factor to enhance delay resolution while avoiding delay ambiguity. Our analysis shows that the proposed TSDE method can achieve the full-bandwidth delay resolution and unambiguous range, by using only at most half of the full bandwidth, provided that the channel delay spread is less than half of the unambiguous range. Numerical results show the superiority of the proposed method over the conventional method with collocated subcarriers. △ Less

Submitted 28 December, 2024; originally announced December 2024.

arXiv:2409.19835 [pdf, other]

MoCoLSK: Modality Conditioned High-Resolution Downscaling for Land Surface Temperature

Authors: Qun Dai, Chunyang Yuan, Yimian Dai, Yuxuan Li, Xiang Li, Kang Ni, Jianhui Xu, Xiangbo Shu, Jian Yang

Abstract: Land Surface Temperature (LST) is a critical parameter for environmental studies, but directly obtaining high spatial resolution LST data remains challenging due to the spatio-temporal trade-off in satellite remote sensing. Guided LST downscaling has emerged as an alternative solution to overcome these limitations, but current methods often neglect spatial non-stationarity, and there is a lack of… ▽ More Land Surface Temperature (LST) is a critical parameter for environmental studies, but directly obtaining high spatial resolution LST data remains challenging due to the spatio-temporal trade-off in satellite remote sensing. Guided LST downscaling has emerged as an alternative solution to overcome these limitations, but current methods often neglect spatial non-stationarity, and there is a lack of an open-source ecosystem for deep learning methods. In this paper, we propose the Modality-Conditional Large Selective Kernel (MoCoLSK) Network, a novel architecture that dynamically fuses multi-modal data through modality-conditioned projections. MoCoLSK achieves a confluence of dynamic receptive field adjustment and multi-modal feature fusion, leading to enhanced LST prediction accuracy. Furthermore, we establish the GrokLST project, a comprehensive open-source ecosystem featuring the GrokLST dataset, a high-resolution benchmark, and the GrokLST toolkit, an open-source PyTorch-based toolkit encapsulating MoCoLSK alongside 40+ state-of-the-art approaches. Extensive experimental results validate MoCoLSK's effectiveness in capturing complex dependencies and subtle variations within multispectral data, outperforming existing methods in LST downscaling. Our code, dataset, and toolkit are available at https://github.com/GrokCV/GrokLST. △ Less

Submitted 2 March, 2025; v1 submitted 29 September, 2024; originally announced September 2024.

Comments: Accepted by IEEE TGRS

arXiv:2406.05389 [pdf]

doi 10.1109/TGRS.2024.3412286

A Deep Learning-Augmented Stand-off Radar Scheme for Rapidly Detecting Tree Defects

Authors: Jiwei Qian, Yee Hui Lee, Kaixuan Cheng, Qiqi Dai, Mohamed Lokman Mohd Yusof, Daryl Lee, Abdulkadir C. Yucel

Abstract: Tree defect detection is crucial for the structural health screening of trees. Existing nondestructive testing (NDT) techniques for tree defect detection require time-consuming and labor-intensive measurement campaigns. This discourages their application for the routine structural health screening of whole populations of managed urban trees. To address this issue, this study proposes a deep-learni… ▽ More Tree defect detection is crucial for the structural health screening of trees. Existing nondestructive testing (NDT) techniques for tree defect detection require time-consuming and labor-intensive measurement campaigns. This discourages their application for the routine structural health screening of whole populations of managed urban trees. To address this issue, this study proposes a deep-learning augmented stand-off radar scheme for contactless scanning of tree trunks and rapid detection of tree defects. In this scheme, the antenna is moved along a straight trajectory at a distance from the tree trunk to obtain the trunk's B-scan. The obtained raw B-scan is then processed by a signal-processing framework specifically developed for revealing the scattering signatures of defects in B-scan, which achieves a 30 dB and 22 dB increase in the signal-to-clutter and noise ratio of the measurement data of tree trunk samples and living trees, respectively. Finally, the processed B-scan is input into a multilevel feature fusion neural network particularly designed for extracting the signature of the defect in the processed B-scan in real time. The developed scheme's applications to the detection of defects in real fresh-cut tree trunks show that the stand-off radar scheme can detect tree defects with 96% accuracy. This stand-off radar scheme is the first contactless NDT technique for tree defect detection while operated on a straight trajectory and potentially can be integrated into the routine tree inspection workflow which is part of urban tree management. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Comments: Accepted and to be published in IEEE Transactions on Geoscience and Remote Sensing

arXiv:2405.16850 [pdf, other]

UniCompress: Enhancing Multi-Data Medical Image Compression with Knowledge Distillation

Authors: Runzhao Yang, Yinda Chen, Zhihong Zhang, Xiaoyu Liu, Zongren Li, Kunlun He, Zhiwei Xiong, Jinli Suo, Qionghai Dai

Abstract: In the field of medical image compression, Implicit Neural Representation (INR) networks have shown remarkable versatility due to their flexible compression ratios, yet they are constrained by a one-to-one fitting approach that results in lengthy encoding times. Our novel method, ``\textbf{UniCompress}'', innovatively extends the compression capabilities of INR by being the first to compress multi… ▽ More In the field of medical image compression, Implicit Neural Representation (INR) networks have shown remarkable versatility due to their flexible compression ratios, yet they are constrained by a one-to-one fitting approach that results in lengthy encoding times. Our novel method, ``\textbf{UniCompress}'', innovatively extends the compression capabilities of INR by being the first to compress multiple medical data blocks using a single INR network. By employing wavelet transforms and quantization, we introduce a codebook containing frequency domain information as a prior input to the INR network. This enhances the representational power of INR and provides distinctive conditioning for different image blocks. Furthermore, our research introduces a new technique for the knowledge distillation of implicit representations, simplifying complex model knowledge into more manageable formats to improve compression ratios. Extensive testing on CT and electron microscopy (EM) datasets has demonstrated that UniCompress outperforms traditional INR methods and commercial compression solutions like HEVC, especially in complex and high compression scenarios. Notably, compared to existing INR techniques, UniCompress achieves a 4$\sim$5 times increase in compression speed, marking a significant advancement in the field of medical image compression. Codes will be publicly available. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2404.07551 [pdf, other]

Event-Enhanced Snapshot Compressive Videography at 10K FPS

Authors: Bo Zhang, Jinli Suo, Qionghai Dai

Abstract: Video snapshot compressive imaging (SCI) encodes the target dynamic scene compactly into a snapshot and reconstructs its high-speed frame sequence afterward, greatly reducing the required data footprint and transmission bandwidth as well as enabling high-speed imaging with a low frame rate intensity camera. In implementation, high-speed dynamics are encoded via temporally varying patterns, and onl… ▽ More Video snapshot compressive imaging (SCI) encodes the target dynamic scene compactly into a snapshot and reconstructs its high-speed frame sequence afterward, greatly reducing the required data footprint and transmission bandwidth as well as enabling high-speed imaging with a low frame rate intensity camera. In implementation, high-speed dynamics are encoded via temporally varying patterns, and only frames at corresponding temporal intervals can be reconstructed, while the dynamics occurring between consecutive frames are lost. To unlock the potential of conventional snapshot compressive videography, we propose a novel hybrid "intensity+event" imaging scheme by incorporating an event camera into a video SCI setup. Our proposed system consists of a dual-path optical setup to record the coded intensity measurement and intermediate event signals simultaneously, which is compact and photon-efficient by collecting the half photons discarded in conventional video SCI. Correspondingly, we developed a dual-branch Transformer utilizing the reciprocal relationship between two data modes to decode dense video frames. Extensive experiments on both simulated and real-captured data demonstrate our superiority to state-of-the-art video SCI and video frame interpolation (VFI) methods. Benefiting from the new hybrid design leveraging both intrinsic redundancy in videos and the unique feature of event cameras, we achieve high-quality videography at 0.1ms time intervals with a low-cost CMOS image sensor working at 24 FPS. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2403.15853 [pdf]

An edge detection-based deep learning approach for tear meniscus height measurement

Authors: Kesheng Wang, Kunhui Xu, Xiaoyu Chen, Chunlei He, Jianfeng Zhang, Dexing Kong, Qi Dai, Shoujun Huang

Abstract: Automatic measurements of tear meniscus height (TMH) have been achieved by using deep learning techniques; however, annotation is significantly influenced by subjective factors and is both time-consuming and labor-intensive. In this paper, we introduce an automatic TMH measurement technique based on edge detection-assisted annotation within a deep learning framework. This method generates mask lab… ▽ More Automatic measurements of tear meniscus height (TMH) have been achieved by using deep learning techniques; however, annotation is significantly influenced by subjective factors and is both time-consuming and labor-intensive. In this paper, we introduce an automatic TMH measurement technique based on edge detection-assisted annotation within a deep learning framework. This method generates mask labels less affected by subjective factors with enhanced efficiency compared to previous annotation approaches. For improved segmentation of the pupil and tear meniscus areas, the convolutional neural network Inceptionv3 was first implemented as an image quality assessment model, effectively identifying higher-quality images with an accuracy of 98.224%. Subsequently, by using the generated labels, various algorithms, including Unet, ResUnet, Deeplabv3+FcnResnet101, Deeplabv3+FcnResnet50, FcnResnet50, and FcnResnet101 were trained, with Unet demonstrating the best performance. Finally, Unet was used for automatic pupil and tear meniscus segmentation to locate the center of the pupil and calculate TMH,respectively. An evaluation of the mask quality predicted by Unet indicated a Mean Intersection over Union of 0.9362, a recall of 0.9261, a precision of 0.9423, and an F1-Score of 0.9326. Additionally, the TMH predicted by the model was assessed, with the fitting curve represented as y= 0.982x-0.862, an overall correlation coefficient of r^2=0.961 , and an accuracy of 94.80% (237/250). In summary, the algorithm can automatically screen images based on their quality,segment the pupil and tear meniscus areas, and automatically measure TMH. Measurement results using the AI algorithm demonstrate a high level of consistency with manual measurements, offering significant support to clinical doctors in diagnosing dry eye disease. △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: 22 pages, 5 figures

arXiv:2311.13134 [pdf, other]

Lightweight High-Speed Photography Built on Coded Exposure and Implicit Neural Representation of Videos

Authors: Zhihong Zhang, Runzhao Yang, Jinli Suo, Yuxiao Cheng, Qionghai Dai

Abstract: The demand for compact cameras capable of recording high-speed scenes with high resolution is steadily increasing. However, achieving such capabilities often entails high bandwidth requirements, resulting in bulky, heavy systems unsuitable for low-capacity platforms. To address this challenge, leveraging a coded exposure setup to encode a frame sequence into a blurry snapshot and subsequently retr… ▽ More The demand for compact cameras capable of recording high-speed scenes with high resolution is steadily increasing. However, achieving such capabilities often entails high bandwidth requirements, resulting in bulky, heavy systems unsuitable for low-capacity platforms. To address this challenge, leveraging a coded exposure setup to encode a frame sequence into a blurry snapshot and subsequently retrieve the latent sharp video presents a lightweight solution. Nevertheless, restoring motion from blur remains a formidable challenge due to the inherent ill-posedness of motion blur decomposition, the intrinsic ambiguity in motion direction, and the diverse motions present in natural videos. In this study, we propose a novel approach to address these challenges by combining the classical coded exposure imaging technique with the emerging implicit neural representation for videos. We strategically embed motion direction cues into the blurry image during the imaging process. Additionally, we develop a novel implicit neural representation based blur decomposition network to sequentially extract the latent video frames from the blurry image, leveraging the embedded motion direction cues. To validate the effectiveness and efficiency of our proposed framework, we conduct extensive experiments using benchmark datasets and real-captured blurry images. The results demonstrate that our approach significantly outperforms existing methods in terms of both quality and flexibility. The code for our work is available at .https://github.com/zhihongz/BDINR △ Less

Submitted 28 August, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

Comments: Accepted by IJCV

arXiv:2310.01861 [pdf, other]

Shifting More Attention to Breast Lesion Segmentation in Ultrasound Videos

Authors: Junhao Lin, Qian Dai, Lei Zhu, Huazhu Fu, Qiong Wang, Weibin Li, Wenhao Rao, Xiaoyang Huang, Liansheng Wang

Abstract: Breast lesion segmentation in ultrasound (US) videos is essential for diagnosing and treating axillary lymph node metastasis. However, the lack of a well-established and large-scale ultrasound video dataset with high-quality annotations has posed a persistent challenge for the research community. To overcome this issue, we meticulously curated a US video breast lesion segmentation dataset comprisi… ▽ More Breast lesion segmentation in ultrasound (US) videos is essential for diagnosing and treating axillary lymph node metastasis. However, the lack of a well-established and large-scale ultrasound video dataset with high-quality annotations has posed a persistent challenge for the research community. To overcome this issue, we meticulously curated a US video breast lesion segmentation dataset comprising 572 videos and 34,300 annotated frames, covering a wide range of realistic clinical scenarios. Furthermore, we propose a novel frequency and localization feature aggregation network (FLA-Net) that learns temporal features from the frequency domain and predicts additional lesion location positions to assist with breast lesion segmentation. We also devise a localization-based contrastive loss to reduce the lesion location distance between neighboring video frames within the same video and enlarge the location distances between frames from different ultrasound videos. Our experiments on our annotated dataset and two public video polyp segmentation datasets demonstrate that our proposed FLA-Net achieves state-of-the-art performance in breast lesion segmentation in US videos and video polyp segmentation while significantly reducing time and space complexity. Our model and dataset are available at https://github.com/jhl-Det/FLA-Net. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: 10 pages

arXiv:2305.05425 [pdf]

doi 10.1109/TGRS.2023.3275306

3DInvNet: A Deep Learning-Based 3D Ground-Penetrating Radar Data Inversion

Authors: Qiqi Dai, Yee Hui Lee, Hai-Han Sun, Genevieve Ow, Mohamed Lokman Mohd Yusof, Abdulkadir C. Yucel

Abstract: The reconstruction of the 3D permittivity map from ground-penetrating radar (GPR) data is of great importance for mapping subsurface environments and inspecting underground structural integrity. Traditional iterative 3D reconstruction algorithms suffer from strong non-linearity, ill-posedness, and high computational cost. To tackle these issues, a 3D deep learning scheme, called 3DInvNet, is propo… ▽ More The reconstruction of the 3D permittivity map from ground-penetrating radar (GPR) data is of great importance for mapping subsurface environments and inspecting underground structural integrity. Traditional iterative 3D reconstruction algorithms suffer from strong non-linearity, ill-posedness, and high computational cost. To tackle these issues, a 3D deep learning scheme, called 3DInvNet, is proposed to reconstruct 3D permittivity maps from GPR C-scans. The proposed scheme leverages a prior 3D convolutional neural network with a feature attention mechanism to suppress the noise in the C-scans due to subsurface heterogeneous soil environments. Then a 3D U-shaped encoder-decoder network with multi-scale feature aggregation modules is designed to establish the optimal inverse mapping from the denoised C-scans to 3D permittivity maps. Furthermore, a three-step separate learning strategy is employed to pre-train and fine-tune the networks. The proposed scheme is applied to numerical simulation as well as real measurement data. The quantitative and qualitative results show the network capability, generalizability, and robustness in denoising GPR C-scans and reconstructing 3D permittivity maps of subsurface objects. △ Less

Submitted 9 May, 2023; originally announced May 2023.

arXiv:2301.10167 [pdf, other]

EEG Opto-processor: epileptic seizure detection using diffractive photonic computing units

Authors: Tao Yan, Maoqi Zhang, Sen Wan, Kaifeng Shang, Haiou Zhang, Xun Cao, Xing Lin, Qionghai Dai

Abstract: Electroencephalography (EEG) analysis extracts critical information from brain signals, which has provided fundamental support for various applications, including brain-disease diagnosis and brain-computer interface. However, the real-time processing of large-scale EEG signals at high energy efficiency has placed great challenges for electronic processors on edge computing devices. Here, we propos… ▽ More Electroencephalography (EEG) analysis extracts critical information from brain signals, which has provided fundamental support for various applications, including brain-disease diagnosis and brain-computer interface. However, the real-time processing of large-scale EEG signals at high energy efficiency has placed great challenges for electronic processors on edge computing devices. Here, we propose the EEG opto-processor based on diffractive photonic computing units (DPUs) to effectively process the extracranial and intracranial EEG signals and perform epileptic seizure detection. The signals of EEG channels within a second-time window are optically encoded as inputs to the constructed diffractive neural networks for classification, which monitors the brain state to determine whether it's the symptom of an epileptic seizure or not. We developed both the free-space and integrated DPUs as edge computing systems and demonstrated their applications for real-time epileptic seizure detection with the benchmark datasets, i.e., the CHB-MIT extracranial EEG dataset and Epilepsy-iEEG-Multicenter intracranial EEG dataset, at high computing performance. Along with the channel selection mechanism, both the numerical evaluations and experimental results validated the sufficient high classification accuracies of the proposed opto-processors for supervising the clinical diagnosis. Our work opens up a new research direction of utilizing photonic computing techniques for processing large-scale EEG signals in promoting its broader applications. △ Less

Submitted 9 December, 2022; originally announced January 2023.

Comments: 22 pages, 5 figures

arXiv:2209.15180 [pdf, other]

SCI: A Spectrum Concentrated Implicit Neural Compression for Biomedical Data

Authors: Runzhao Yang, Tingxiong Xiao, Yuxiao Cheng, Qianni Cao, Jinyuan Qu, Jinli Suo, Qionghai Dai

Abstract: Massive collection and explosive growth of biomedical data, demands effective compression for efficient storage, transmission and sharing. Readily available visual data compression techniques have been studied extensively but tailored for natural images/videos, and thus show limited performance on biomedical data which are of different features and larger diversity. Emerging implicit neural repres… ▽ More Massive collection and explosive growth of biomedical data, demands effective compression for efficient storage, transmission and sharing. Readily available visual data compression techniques have been studied extensively but tailored for natural images/videos, and thus show limited performance on biomedical data which are of different features and larger diversity. Emerging implicit neural representation (INR) is gaining momentum and demonstrates high promise for fitting diverse visual data in target-data-specific manner, but a general compression scheme covering diverse biomedical data is so far absent. To address this issue, we firstly derive a mathematical explanation for INR's spectrum concentration property and an analytical insight on the design of INR based compressor. Further, we propose a Spectrum Concentrated Implicit neural compression (SCI) which adaptively partitions the complex biomedical data into blocks matching INR's concentrated spectrum envelop, and design a funnel shaped neural network capable of representing each block with a small number of parameters. Based on this design, we conduct compression via optimization under given budget and allocate the available parameters with high representation accuracy. The experiments show SCI's superior performance to state-of-the-art methods including commercial compressors, data-driven ones, and INR based counterparts on diverse biomedical data. The source code can be found at https://github.com/RichealYoung/ImplicitNeuralCompression.git. △ Less

Submitted 23 November, 2022; v1 submitted 29 September, 2022; originally announced September 2022.

Comments: accepted to AAAI2023

ACM Class: I.4.2; I.2.10

arXiv:2207.08201 [pdf, other]

doi 10.1109/TIP.2023.3244417

INFWIDE: Image and Feature Space Wiener Deconvolution Network for Non-blind Image Deblurring in Low-Light Conditions

Authors: Zhihong Zhang, Yuxiao Cheng, Jinli Suo, Liheng Bian, Qionghai Dai

Abstract: Under low-light environment, handheld photography suffers from severe camera shake under long exposure settings. Although existing deblurring algorithms have shown promising performance on well-exposed blurry images, they still cannot cope with low-light snapshots. Sophisticated noise and saturation regions are two dominating challenges in practical low-light deblurring. In this work, we propose a… ▽ More Under low-light environment, handheld photography suffers from severe camera shake under long exposure settings. Although existing deblurring algorithms have shown promising performance on well-exposed blurry images, they still cannot cope with low-light snapshots. Sophisticated noise and saturation regions are two dominating challenges in practical low-light deblurring. In this work, we propose a novel non-blind deblurring method dubbed image and feature space Wiener deconvolution network (INFWIDE) to tackle these problems systematically. In terms of algorithm design, INFWIDE proposes a two-branch architecture, which explicitly removes noise and hallucinates saturated regions in the image space and suppresses ringing artifacts in the feature space, and integrates the two complementary outputs with a subtle multi-scale fusion network for high quality night photograph deblurring. For effective network training, we design a set of loss functions integrating a forward imaging model and backward reconstruction to form a close-loop regularization to secure good convergence of the deep neural network. Further, to optimize INFWIDE's applicability in real low-light conditions, a physical-process-based low-light noise model is employed to synthesize realistic noisy night photographs for model training. Taking advantage of the traditional Wiener deconvolution algorithm's physically driven characteristics and arisen deep neural network's representation ability, INFWIDE can recover fine details while suppressing the unpleasant artifacts during deblurring. Extensive experiments on synthetic data and real data demonstrate the superior performance of the proposed approach. △ Less

Submitted 17 February, 2023; v1 submitted 17 July, 2022; originally announced July 2022.

Comments: Accepted by IEEE Trans. Image Process, early access version available at https://ieeexplore.ieee.org/document/10047966

arXiv:2207.06527 [pdf]

doi 10.1109/LGRS.2022.3192003

A Deep Learning-Based GPR Forward Solver for Predicting B-Scans of Subsurface Objects

Authors: Qiqi Dai, Yee Hui Lee, Hai-Han Sun, Jiwei Qian, Genevieve Ow, Mohamed Lokman Mohd Yusof, Abdulkadir C. Yucel

Abstract: The forward full-wave modeling of ground-penetrating radar (GPR) facilitates the understanding and interpretation of GPR data. Traditional forward solvers require excessive computational resources, especially when their repetitive executions are needed in signal processing and/or machine learning algorithms for GPR data inversion. To alleviate the computational burden, a deep learning-based 2D GPR… ▽ More The forward full-wave modeling of ground-penetrating radar (GPR) facilitates the understanding and interpretation of GPR data. Traditional forward solvers require excessive computational resources, especially when their repetitive executions are needed in signal processing and/or machine learning algorithms for GPR data inversion. To alleviate the computational burden, a deep learning-based 2D GPR forward solver is proposed to predict the GPR B-scans of subsurface objects buried in the heterogeneous soil. The proposed solver is constructed as a bimodal encoder-decoder neural network. Two encoders followed by an adaptive feature fusion module are designed to extract informative features from the subsurface permittivity and conductivity maps. The decoder subsequently constructs the B-scans from the fused feature representations. To enhance the network's generalization capability, transfer learning is employed to fine-tune the network for new scenarios vastly different from those in training set. Numerical results show that the proposed solver achieves a mean relative error of 1.28%. For predicting the B-scan of one subsurface object, the proposed solver requires 12 milliseconds, which is 22,500x less than the time required by a classical physics-based solver. △ Less

Submitted 13 July, 2022; originally announced July 2022.

arXiv:2205.07567 [pdf]

doi 10.1109/TAP.2022.3176386

DMRF-UNet: A Two-Stage Deep Learning Scheme for GPR Data Inversion under Heterogeneous Soil Conditions

Authors: Qiqi Dai, Yee Hui Lee, Hai-Han Sun, Genevieve Ow, Mohamed Lokman Mohd Yusof, Abdulkadir C. Yucel

Abstract: Traditional ground-penetrating radar (GPR) data inversion leverages iterative algorithms which suffer from high computation costs and low accuracy when applied to complex subsurface scenarios. Existing deep learning-based methods focus on the ideal homogeneous subsurface environments and ignore the interference due to clutters and noise in real-world heterogeneous environments. To address these is… ▽ More Traditional ground-penetrating radar (GPR) data inversion leverages iterative algorithms which suffer from high computation costs and low accuracy when applied to complex subsurface scenarios. Existing deep learning-based methods focus on the ideal homogeneous subsurface environments and ignore the interference due to clutters and noise in real-world heterogeneous environments. To address these issues, a two-stage deep neural network (DNN), called DMRF-UNet, is proposed to reconstruct the permittivity distributions of subsurface objects from GPR B-scans under heterogeneous soil conditions. In the first stage, a U-shape DNN with multi-receptive-field convolutions (MRF-UNet1) is built to remove the clutters due to inhomogeneity of the heterogeneous soil. Then the denoised B-scan from the MRF-UNet1 is combined with the noisy B-scan to be inputted to the DNN in the second stage (MRF-UNet2). The MRF-UNet2 learns the inverse mapping relationship and reconstructs the permittivity distribution of subsurface objects. To avoid information loss, an end-to-end training method combining the loss functions of two stages is introduced. A wide range of subsurface heterogeneous scenarios and B-scans are generated to evaluate the inversion performance. The test results in the numerical experiment and the real measurement show that the proposed network reconstructs the permittivities, shapes, sizes, and locations of subsurface objects with high accuracy. The comparison with existing methods demonstrates the superiority of the proposed methodology for the inversion under heterogeneous soil conditions. △ Less

Submitted 16 May, 2022; originally announced May 2022.

arXiv:2204.04987 [pdf, other]

doi 10.1016/j.inffus.2023.01.013

A Dual Sensor Computational Camera for High Quality Dark Videography

Authors: Yuxiao Cheng, Runzhao Yang, Zhihong Zhang, Jinli Suo, Qionghai Dai

Abstract: Videos captured under low light conditions suffer from severe noise. A variety of efforts have been devoted to image/video noise suppression and made large progress. However, in extremely dark scenarios, extensive photon starvation would hamper precise noise modeling. Instead, developing an imaging system collecting more photons is a more effective way for high-quality video capture under low illu… ▽ More Videos captured under low light conditions suffer from severe noise. A variety of efforts have been devoted to image/video noise suppression and made large progress. However, in extremely dark scenarios, extensive photon starvation would hamper precise noise modeling. Instead, developing an imaging system collecting more photons is a more effective way for high-quality video capture under low illuminations. In this paper, we propose to build a dual-sensor camera to additionally collect the photons in NIR wavelength, and make use of the correlation between RGB and near-infrared (NIR) spectrum to perform high-quality reconstruction from noisy dark video pairs. In hardware, we build a compact dual-sensor camera capturing RGB and NIR videos simultaneously. Computationally, we propose a dual-channel multi-frame attention network (DCMAN) utilizing spatial-temporal-spectral priors to reconstruct the low-light RGB and NIR videos. In addition, we build a high-quality paired RGB and NIR video dataset, based on which the approach can be applied to different sensors easily by training the DCMAN model with simulated noisy input following a physical-process-based CMOS noise model. Both experiments on synthetic and real videos validate the performance of this compact dual-sensor camera design and the corresponding reconstruction algorithm in dark videography. △ Less

Submitted 11 April, 2022; originally announced April 2022.

Journal ref: Information Fusion Volume 93, May 2023, Pages 429-440

arXiv:2112.13494 [pdf]

doi 10.1109/TGRS.2021.3138974

Estimating Parameters of the Tree Root in Heterogeneous Soil Environments via Mask-Guided Multi-Polarimetric Integration Neural Network

Authors: Hai-Han Sun, Yee Hui Lee, Qiqi Dai, Chongyi Li, Genevieve Ow, Mohamed Lokman Mohd Yusof, Abdulkadir C. Yucel

Abstract: Ground-penetrating radar (GPR) has been used as a non-destructive tool for tree root inspection. Estimating root-related parameters from GPR radargrams greatly facilitates root health monitoring and imaging. However, the task of estimating root-related parameters is challenging as the root reflection is a complex function of multiple root parameters and root orientations. Existing methods can only… ▽ More Ground-penetrating radar (GPR) has been used as a non-destructive tool for tree root inspection. Estimating root-related parameters from GPR radargrams greatly facilitates root health monitoring and imaging. However, the task of estimating root-related parameters is challenging as the root reflection is a complex function of multiple root parameters and root orientations. Existing methods can only estimate a single root parameter at a time without considering the influence of other parameters and root orientations, resulting in limited estimation accuracy under different root conditions. In addition, soil heterogeneity introduces clutter in GPR radargrams, making the data processing and interpretation even harder. To address these issues, a novel neural network architecture, called mask-guided multi-polarimetric integration neural network (MMI-Net), is proposed to automatically and simultaneously estimate multiple root-related parameters in heterogeneous soil environments. The MMI-Net includes two sub-networks: a MaskNet that predicts a mask to highlight the root reflection area to eliminate interfering environmental clutter, and a ParaNet that uses the predicted mask as guidance to integrate, extract, and emphasize informative features in multi-polarimetric radargrams for accurate estimation of five key root-related parameters. The parameters include the root depth, diameter, relative permittivity, horizontal and vertical orientation angles. Experimental results demonstrate that the proposed MMI-Net achieves high estimation accuracy in these root-related parameters. This is the first work that takes the combined contributions of root parameters and spatial orientations into account and simultaneously estimates multiple root-related parameters. The data and code implemented in the paper can be found at https://haihan-sun.github.io/GPR.html. △ Less

Submitted 26 December, 2021; originally announced December 2021.

Comments: 14 pages, 12 figures

arXiv:2111.09103 [pdf, other]

Fast and Light-Weight Network for Single Frame Structured Illumination Microscopy Super-Resolution

Authors: Xi Cheng, Jun Li, Qiang Dai, Zhenyong Fu, Jian Yang

Abstract: Structured illumination microscopy (SIM) is an important super-resolution based microscopy technique that breaks the diffraction limit and enhances optical microscopy systems. With the development of biology and medical engineering, there is a high demand for real-time and robust SIM imaging under extreme low light and short exposure environments. Existing SIM techniques typically require multiple… ▽ More Structured illumination microscopy (SIM) is an important super-resolution based microscopy technique that breaks the diffraction limit and enhances optical microscopy systems. With the development of biology and medical engineering, there is a high demand for real-time and robust SIM imaging under extreme low light and short exposure environments. Existing SIM techniques typically require multiple structured illumination frames to produce a high-resolution image. In this paper, we propose a single-frame structured illumination microscopy (SF-SIM) based on deep learning. Our SF-SIM only needs one shot of a structured illumination frame and generates similar results compared with the traditional SIM systems that typically require 15 shots. In our SF-SIM, we propose a noise estimator which can effectively suppress the noise in the image and enable our method to work under the low light and short exposure environment, without the need for stacking multiple frames for non-local denoising. We also design a bandpass attention module that makes our deep network more sensitive to the change of frequency and enhances the imaging quality. Our proposed SF-SIM is almost 14 times faster than traditional SIM methods when achieving similar results. Therefore, our method is significantly valuable for the development of microbiology and medicine. △ Less

Submitted 17 November, 2021; originally announced November 2021.

Comments: 9 pages

arXiv:2109.08880 [pdf, other]

doi 10.1109/JPROC.2023.3338272

Computational Imaging and Artificial Intelligence: The Next Revolution of Mobile Vision

Authors: Jinli Suo, Weihang Zhang, Jin Gong, Xin Yuan, David J. Brady, Qionghai Dai

Abstract: Signal capture stands in the forefront to perceive and understand the environment and thus imaging plays the pivotal role in mobile vision. Recent explosive progresses in Artificial Intelligence (AI) have shown great potential to develop advanced mobile platforms with new imaging devices. Traditional imaging systems based on the "capturing images first and processing afterwards" mechanism cannot m… ▽ More Signal capture stands in the forefront to perceive and understand the environment and thus imaging plays the pivotal role in mobile vision. Recent explosive progresses in Artificial Intelligence (AI) have shown great potential to develop advanced mobile platforms with new imaging devices. Traditional imaging systems based on the "capturing images first and processing afterwards" mechanism cannot meet this unprecedented demand. Differently, Computational Imaging (CI) systems are designed to capture high-dimensional data in an encoded manner to provide more information for mobile vision systems.Thanks to AI, CI can now be used in real systems by integrating deep learning algorithms into the mobile vision platform to achieve the closed loop of intelligent acquisition, processing and decision making, thus leading to the next revolution of mobile vision.Starting from the history of mobile vision using digital cameras, this work first introduces the advances of CI in diverse applications and then conducts a comprehensive review of current research topics combining CI and AI. Motivated by the fact that most existing studies only loosely connect CI and AI (usually using AI to improve the performance of CI and only limited works have deeply connected them), in this work, we propose a framework to deeply integrate CI and AI by using the example of self-driving vehicles with high-speed communication, edge computing and traffic planning. Finally, we outlook the future of CI plus AI by investigating new materials, brain science and new computing techniques to shed light on new directions of mobile vision systems. △ Less

Submitted 18 September, 2021; originally announced September 2021.

arXiv:2107.01422 [pdf, other]

Imaging dynamics beneath turbid media via parallelized single-photon detection

Authors: Shiqi Xu, Xi Yang, Wenhui Liu, Joakim Jonsson, Ruobing Qian, Pavan Chandra Konda, Kevin C. Zhou, Lucas Kreiss, Qionghai Dai, Haoqian Wang, Edouard Berrocal, Roarke Horstmeyer

Abstract: Noninvasive optical imaging through dynamic scattering media has numerous important biomedical applications but still remains a challenging task. While standard diffuse imaging methods measure optical absorption or fluorescent emission, it is also well-established that the temporal correlation of scattered coherent light diffuses through tissue much like optical intensity. Few works to date, howev… ▽ More Noninvasive optical imaging through dynamic scattering media has numerous important biomedical applications but still remains a challenging task. While standard diffuse imaging methods measure optical absorption or fluorescent emission, it is also well-established that the temporal correlation of scattered coherent light diffuses through tissue much like optical intensity. Few works to date, however, have aimed to experimentally measure and process such temporal correlation data to demonstrate deep-tissue video reconstruction of decorrelation dynamics. In this work, we utilize a single-photon avalanche diode (SPAD) array camera to simultaneously monitor the temporal dynamics of speckle fluctuations at the single-photon level from 12 different phantom tissue surface locations delivered via a customized fiber bundle array. We then apply a deep neural network to convert the acquired single-photon measurements into video of scattering dynamics beneath rapidly decorrelating tissue phantoms. We demonstrate the ability to reconstruct images of transient (0.1-0.4s) dynamic events occurring up to 8 mm beneath a decorrelating tissue phantom with millimeter-scale resolution, and highlight how our model can flexibly extend to monitor flow speed within buried phantom vessels. △ Less

Submitted 12 June, 2022; v1 submitted 3 July, 2021; originally announced July 2021.

arXiv:2106.15765 [pdf, other]

doi 10.1364/PRJ.435256

10-mega pixel snapshot compressive imaging with a hybrid coded aperture

Authors: Zhihong Zhang, Chao Deng, Yang Liu, Xin Yuan, Jinli Suo, Qionghai Dai

Abstract: High resolution images are widely used in our daily life, whereas high-speed video capture is challenging due to the low frame rate of cameras working at the high resolution mode. Digging deeper, the main bottleneck lies in the low throughput of existing imaging systems. Towards this end, snapshot compressive imaging (SCI) was proposed as a promising solution to improve the throughput of imaging s… ▽ More High resolution images are widely used in our daily life, whereas high-speed video capture is challenging due to the low frame rate of cameras working at the high resolution mode. Digging deeper, the main bottleneck lies in the low throughput of existing imaging systems. Towards this end, snapshot compressive imaging (SCI) was proposed as a promising solution to improve the throughput of imaging systems by compressive sampling and computational reconstruction. During acquisition, multiple high-speed images are encoded and collapsed to a single measurement. After this, algorithms are employed to retrieve the video frames from the coded snapshot. Recently developed Plug-and-Play (PnP) algorithms make it possible for SCI reconstruction in large-scale problems. However, the lack of high-resolution encoding systems still precludes SCI's wide application. In this paper, we build a novel hybrid coded aperture snapshot compressive imaging (HCA-SCI) system by incorporating a dynamic liquid crystal on silicon and a high-resolution lithography mask. We further implement a PnP reconstruction algorithm with cascaded denoisers for high quality reconstruction. Based on the proposed HCA-SCI system and algorithm, we achieve a 10-mega pixel SCI system to capture high-speed scenes, leading to a high throughput of 4.6G voxels per second. Both simulation and real data experiments verify the feasibility and performance of our proposed HCA-SCI scheme. △ Less

Submitted 15 August, 2021; v1 submitted 29 June, 2021; originally announced June 2021.

Comments: 11 pages, 8 figures, accepted by Photonics Research

arXiv:2106.00682 [pdf]

Prostate cancer histopathology with label-free multispectral deep UV microscopy quantifies phenotypes of tumor grade and aggressiveness

Authors: Soheil Soltani, Ashkan Ojaghi, Hui Qiao, Nischita Kaza, Xinyang Li, Qionghai Dai, Adeboye O Osunkoya, Francisco E Robles

Abstract: Identifying prostate cancer patients that are harboring aggressive forms of prostate cancer remains a significant clinical challenge. To shed light on this problem, we develop an approach based on multispectral deep-ultraviolet (UV) microscopy that provides novel quantitative insight into the aggressiveness and grade of this disease. First, we find that UV spectral signatures from endogenous molec… ▽ More Identifying prostate cancer patients that are harboring aggressive forms of prostate cancer remains a significant clinical challenge. To shed light on this problem, we develop an approach based on multispectral deep-ultraviolet (UV) microscopy that provides novel quantitative insight into the aggressiveness and grade of this disease. First, we find that UV spectral signatures from endogenous molecules give rise to a phenotypical continuum that differentiates critical structures of thin tissue sections with subcellular spatial resolution, including nuclei, cytoplasm, stroma, basal cells, nerves, and inflammation. Further, we show that this phenotypical continuum can be applied as a surrogate biomarker of prostate cancer malignancy, where patients with the most aggressive tumors show a ubiquitous glandular phenotypical shift. Lastly, we adapt a two-part Cycle-consistent Generative Adversarial Network to translate the label-free deep-UV images into virtual hematoxylin and eosin (H&E) stained images. Agreement between the virtual H&E images and the gold standard H&E-stained tissue sections is evaluated by a panel of pathologists who find that the two modalities are in excellent agreement. This work has significant implications towards improving our ability to objectively quantify prostate cancer grade and aggressiveness, thus improving the management and clinical outcomes of prostate cancer patients. This same approach can also be applied broadly in other tumor types to achieve low-cost, stain-free, quantitative histopathological analysis. △ Less

Submitted 1 June, 2021; originally announced June 2021.

arXiv:2104.03078 [pdf, other]

Universal and Flexible Optical Aberration Correction Using Deep-Prior Based Deconvolution

Authors: Xiu Li, Jinli Suo, Weihang Zhang, Xin Yuan, Qionghai Dai

Abstract: High quality imaging usually requires bulky and expensive lenses to compensate geometric and chromatic aberrations. This poses high constraints on the optical hash or low cost applications. Although one can utilize algorithmic reconstruction to remove the artifacts of low-end lenses, the degeneration from optical aberrations is spatially varying and the computation has to trade off efficiency for… ▽ More High quality imaging usually requires bulky and expensive lenses to compensate geometric and chromatic aberrations. This poses high constraints on the optical hash or low cost applications. Although one can utilize algorithmic reconstruction to remove the artifacts of low-end lenses, the degeneration from optical aberrations is spatially varying and the computation has to trade off efficiency for performance. For example, we need to conduct patch-wise optimization or train a large set of local deep neural networks to achieve high reconstruction performance across the whole image. In this paper, we propose a PSF aware plug-and-play deep network, which takes the aberrant image and PSF map as input and produces the latent high quality version via incorporating lens-specific deep priors, thus leading to a universal and flexible optical aberration correction method. Specifically, we pre-train a base model from a set of diverse lenses and then adapt it to a given lens by quickly refining the parameters, which largely alleviates the time and memory consumption of model learning. The approach is of high efficiency in both training and testing stages. Extensive results verify the promising applications of our proposed approach for compact low-end cameras. △ Less

Submitted 18 August, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

Comments: ICCV2021

arXiv:2103.13043 [pdf, other]

doi 10.1109/TPAMI.2018.2845393

Light Field Reconstruction Using Convolutional Network on EPI and Extended Applications

Authors: Gaochang Wu, Yebin Liu, Lu Fang, Qionghai Dai, Tianyou Chai

Abstract: In this paper, a novel convolutional neural network (CNN)-based framework is developed for light field reconstruction from a sparse set of views. We indicate that the reconstruction can be efficiently modeled as angular restoration on an epipolar plane image (EPI). The main problem in direct reconstruction on the EPI involves an information asymmetry between the spatial and angular dimensions, whe… ▽ More In this paper, a novel convolutional neural network (CNN)-based framework is developed for light field reconstruction from a sparse set of views. We indicate that the reconstruction can be efficiently modeled as angular restoration on an epipolar plane image (EPI). The main problem in direct reconstruction on the EPI involves an information asymmetry between the spatial and angular dimensions, where the detailed portion in the angular dimensions is damaged by undersampling. Directly upsampling or super-resolving the light field in the angular dimensions causes ghosting effects. To suppress these ghosting effects, we contribute a novel "blur-restoration-deblur" framework. First, the "blur" step is applied to extract the low-frequency components of the light field in the spatial dimensions by convolving each EPI slice with a selected blur kernel. Then, the "restoration" step is implemented by a CNN, which is trained to restore the angular details of the EPI. Finally, we use a non-blind "deblur" operation to recover the spatial high frequencies suppressed by the EPI blur. We evaluate our approach on several datasets, including synthetic scenes, real-world scenes and challenging microscope light field data. We demonstrate the high performance and robustness of the proposed framework compared with state-of-the-art algorithms. We further show extended applications, including depth enhancement and interpolation for unstructured input. More importantly, a novel rendering approach is presented by combining the proposed framework and depth information to handle large disparities. △ Less

Submitted 24 March, 2021; originally announced March 2021.

Comments: Published in IEEE TPAMI, 2019

Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019

arXiv:2103.05843 [pdf, other]

Learning to Estimate Kernel Scale and Orientation of Defocus Blur with Asymmetric Coded Aperture

Authors: Jisheng Li, Qi Dai, Jiangtao Wen

Abstract: Consistent in-focus input imagery is an essential precondition for machine vision systems to perceive the dynamic environment. A defocus blur severely degrades the performance of vision systems. To tackle this problem, we propose a deep-learning-based framework estimating the kernel scale and orientation of the defocus blur to adjust lens focus rapidly. Our pipeline utilizes 3D ConvNet for a varia… ▽ More Consistent in-focus input imagery is an essential precondition for machine vision systems to perceive the dynamic environment. A defocus blur severely degrades the performance of vision systems. To tackle this problem, we propose a deep-learning-based framework estimating the kernel scale and orientation of the defocus blur to adjust lens focus rapidly. Our pipeline utilizes 3D ConvNet for a variable number of input hypotheses to select the optimal slice from the input stack. We use random shuffle and Gumbel-softmax to improve network performance. We also propose to generate synthetic defocused images with various asymmetric coded apertures to facilitate training. Experiments are conducted to demonstrate the effectiveness of our framework. △ Less

Submitted 9 March, 2021; originally announced March 2021.

arXiv:2101.04822 [pdf, other]

Plug-and-Play Algorithms for Video Snapshot Compressive Imaging

Authors: Xin Yuan, Yang Liu, Jinli Suo, Frédo Durand, Qionghai Dai

Abstract: We consider the reconstruction problem of video snapshot compressive imaging (SCI), which captures high-speed videos using a low-speed 2D sensor (detector). The underlying principle of SCI is to modulate sequential high-speed frames with different masks and then these encoded frames are integrated into a snapshot on the sensor and thus the sensor can be of low-speed. On one hand, video SCI enjoys… ▽ More We consider the reconstruction problem of video snapshot compressive imaging (SCI), which captures high-speed videos using a low-speed 2D sensor (detector). The underlying principle of SCI is to modulate sequential high-speed frames with different masks and then these encoded frames are integrated into a snapshot on the sensor and thus the sensor can be of low-speed. On one hand, video SCI enjoys the advantages of low-bandwidth, low-power and low-cost. On the other hand, applying SCI to large-scale problems (HD or UHD videos) in our daily life is still challenging and one of the bottlenecks lies in the reconstruction algorithm. Exiting algorithms are either too slow (iterative optimization algorithms) or not flexible to the encoding process (deep learning based end-to-end networks). In this paper, we develop fast and flexible algorithms for SCI based on the plug-and-play (PnP) framework. In addition to the PnP-ADMM method, we further propose the PnP-GAP (generalized alternating projection) algorithm with a lower computational workload. We first employ the image deep denoising priors to show that PnP can recover a UHD color video with 30 frames from a snapshot measurement. Since videos have strong temporal correlation, by employing the video deep denoising priors, we achieve a significant improvement in the results. Furthermore, we extend the proposed PnP algorithms to the color SCI system using mosaic sensors, where each pixel only captures the red, green or blue channels. A joint reconstruction and demosaicing paradigm is developed for flexible and high quality reconstruction of color video SCI systems. Extensive results on both simulation and real datasets verify the superiority of our proposed algorithm. △ Less

Submitted 12 January, 2021; originally announced January 2021.

Comments: 18 pages, 12 figures and 4 tables. Journal extension of arXiv:2003.13654. Code available at https://github.com/liuyang12/PnP-SCI_python

arXiv:2008.11659 [pdf]

doi 10.1038/s41566-021-00796-w

Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit

Authors: Tiankuang Zhou, Xing Lin, Jiamin Wu, Yitong Chen, Hao Xie, Yipeng Li, Jintao Fan, Huaqiang Wu, Lu Fang, Qionghai Dai

Abstract: Application-specific optical processors have been considered disruptive technologies for modern computing that can fundamentally accelerate the development of artificial intelligence (AI) by offering substantially improved computing performance. Recent advancements in optical neural network architectures for neural information processing have been applied to perform various machine learning tasks.… ▽ More Application-specific optical processors have been considered disruptive technologies for modern computing that can fundamentally accelerate the development of artificial intelligence (AI) by offering substantially improved computing performance. Recent advancements in optical neural network architectures for neural information processing have been applied to perform various machine learning tasks. However, the existing architectures have limited complexity and performance; and each of them requires its own dedicated design that cannot be reconfigured to switch between different neural network models for different applications after deployment. Here, we propose an optoelectronic reconfigurable computing paradigm by constructing a diffractive processing unit (DPU) that can efficiently support different neural networks and achieve a high model complexity with millions of neurons. It allocates almost all of its computational operations optically and achieves extremely high speed of data modulation and large-scale network parameter updating by dynamically programming optical modulators and photodetectors. We demonstrated the reconfiguration of the DPU to implement various diffractive feedforward and recurrent neural networks and developed a novel adaptive training approach to circumvent the system imperfections. We applied the trained networks for high-speed classifying of handwritten digit images and human action videos over benchmark datasets, and the experimental results revealed a comparable classification accuracy to the electronic computing approaches. Furthermore, our prototype system built with off-the-shelf optoelectronic components surpasses the performance of state-of-the-art graphics processing units (GPUs) by several times on computing speed and more than an order of magnitude on system energy efficiency. △ Less

Submitted 26 August, 2020; originally announced August 2020.

arXiv:2005.12690 [pdf, other]

doi 10.1109/TPAMI.2020.2996798

SurfaceNet+: An End-to-end 3D Neural Network for Very Sparse Multi-view Stereopsis

Authors: Mengqi Ji, Jinzhi Zhang, Qionghai Dai, Lu Fang

Abstract: Multi-view stereopsis (MVS) tries to recover the 3D model from 2D images. As the observations become sparser, the significant 3D information loss makes the MVS problem more challenging. Instead of only focusing on densely sampled conditions, we investigate sparse-MVS with large baseline angles since the sparser sensation is more practical and more cost-efficient. By investigating various observati… ▽ More Multi-view stereopsis (MVS) tries to recover the 3D model from 2D images. As the observations become sparser, the significant 3D information loss makes the MVS problem more challenging. Instead of only focusing on densely sampled conditions, we investigate sparse-MVS with large baseline angles since the sparser sensation is more practical and more cost-efficient. By investigating various observation sparsities, we show that the classical depth-fusion pipeline becomes powerless for the case with a larger baseline angle that worsens the photo-consistency check. As another line of the solution, we present SurfaceNet+, a volumetric method to handle the 'incompleteness' and the 'inaccuracy' problems induced by a very sparse MVS setup. Specifically, the former problem is handled by a novel volume-wise view selection approach. It owns superiority in selecting valid views while discarding invalid occluded views by considering the geometric prior. Furthermore, the latter problem is handled via a multi-scale strategy that consequently refines the recovered geometry around the region with the repeating pattern. The experiments demonstrate the tremendous performance gap between SurfaceNet+ and state-of-the-art methods in terms of precision and recall. Under the extreme sparse-MVS settings in two datasets, where existing methods can only return very few points, SurfaceNet+ still works as well as in the dense MVS setting. The benchmark and the implementation are publicly available at https://github.com/mjiUST/SurfaceNet-plus. △ Less

Submitted 26 May, 2020; originally announced May 2020.

Comments: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), May 2020

Journal ref: 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

arXiv:2005.12597 [pdf, other]

Perceptual Extreme Super Resolution Network with Receptive Field Block

Authors: Taizhang Shang, Qiuju Dai, Shengchen Zhu, Tong Yang, Yandong Guo

Abstract: Perceptual Extreme Super-Resolution for single image is extremely difficult, because the texture details of different images vary greatly. To tackle this difficulty, we develop a super resolution network with receptive field block based on Enhanced SRGAN. We call our network RFB-ESRGAN. The key contributions are listed as follows. First, for the purpose of extracting multi-scale information and en… ▽ More Perceptual Extreme Super-Resolution for single image is extremely difficult, because the texture details of different images vary greatly. To tackle this difficulty, we develop a super resolution network with receptive field block based on Enhanced SRGAN. We call our network RFB-ESRGAN. The key contributions are listed as follows. First, for the purpose of extracting multi-scale information and enhance the feature discriminability, we applied receptive field block (RFB) to super resolution. RFB has achieved competitive results in object detection and classification. Second, instead of using large convolution kernels in multi-scale receptive field block, several small kernels are used in RFB, which makes us be able to extract detailed features and reduce the computation complexity. Third, we alternately use different upsampling methods in the upsampling stage to reduce the high computation complexity and still remain satisfactory performance. Fourth, we use the ensemble of 10 models of different iteration to improve the robustness of model and reduce the noise introduced by each individual model. Our experimental results show the superior performance of RFB-ESRGAN. According to the preliminary results of NTIRE 2020 Perceptual Extreme Super-Resolution Challenge, our solution ranks first among all the participants. △ Less

Submitted 26 May, 2020; originally announced May 2020.

Comments: CVPRW 2020 accepted oral, 8 pages,45 figures

arXiv:2005.01056 [pdf, other]

NTIRE 2020 Challenge on Perceptual Extreme Super-Resolution: Methods and Results

Authors: Kai Zhang, Shuhang Gu, Radu Timofte, Taizhang Shang, Qiuju Dai, Shengchen Zhu, Tong Yang, Yandong Guo, Younghyun Jo, Sejong Yang, Seon Joo Kim, Lin Zha, Jiande Jiang, Xinbo Gao, Wen Lu, Jing Liu, Kwangjin Yoon, Taegyun Jeon, Kazutoshi Akita, Takeru Ooba, Norimichi Ukita, Zhipeng Luo, Yuehan Yao, Zhenyu Xu, Dongliang He , et al. (38 additional authors not shown)

Abstract: This paper reviews the NTIRE 2020 challenge on perceptual extreme super-resolution with focus on proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor 16 based on a set of prior examples of low and corresponding high resolution images. The goal is to obtain a network design capable to produce high resolution results with the best percept… ▽ More This paper reviews the NTIRE 2020 challenge on perceptual extreme super-resolution with focus on proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor 16 based on a set of prior examples of low and corresponding high resolution images. The goal is to obtain a network design capable to produce high resolution results with the best perceptual quality and similar to the ground truth. The track had 280 registered participants, and 19 teams submitted the final results. They gauge the state-of-the-art in single image super-resolution. △ Less

Submitted 3 May, 2020; originally announced May 2020.

Comments: CVPRW 2020

arXiv:2003.14237 [pdf]

doi 10.1364/OL.417039

Single-pixel coherent diffraction imaging

Authors: Meng Li, Liheng Bian, Guoan Zheng, Andrew Maiden, Yang Liu, Yiming Li, Qionghai Dai, Jun Zhang

Abstract: Complex-field imaging is indispensable for numerous applications at wavelengths from X-ray to THz, with amplitude describing transmittance (or reflectivity) and phase revealing intrinsic structure of the target object. Coherent diffraction imaging (CDI) employs iterative phase retrieval algorithms to process diffraction measurements and is the predominant non-interferometric method to image comple… ▽ More Complex-field imaging is indispensable for numerous applications at wavelengths from X-ray to THz, with amplitude describing transmittance (or reflectivity) and phase revealing intrinsic structure of the target object. Coherent diffraction imaging (CDI) employs iterative phase retrieval algorithms to process diffraction measurements and is the predominant non-interferometric method to image complex fields. However, the working spectrum of CDI is quite narrow, because the diffraction measurements on which it relies require dense array detection with ultra-high dynamic range. Here we report a single-pixel CDI technique that works for a wide waveband. A single-pixel detector instead of an array sensor is employed in the far field for detection. It repeatedly records the DC-only component of the diffracted wavefront scattered from an object as it is illuminated by a sequence of binary modulation patterns. This decreases the measurements' dynamic range by several orders of magnitude. We employ an efficient single-pixel phase-retrieval algorithm to jointly recover the object's 2D amplitude and phase maps from the 1D intensity-only measurements. No a priori object information is needed in the recovery process. We validate the technique's quantitative phase imaging nature using both calibrated phase objects and biological samples, and demonstrate its wide working spectrum with both 488-nm visible light and 980-nm near-infrared light. Our approach paves the way for complex-field imaging in a wider waveband where 2D detector arrays are not available, with broad applications in life and material sciences. △ Less

Submitted 29 March, 2020; originally announced March 2020.

arXiv:2003.13654 [pdf, other]

Plug-and-Play Algorithms for Large-scale Snapshot Compressive Imaging

Authors: Xin Yuan, Yang Liu, Jinli Suo, Qionghai Dai

Abstract: Snapshot compressive imaging (SCI) aims to capture the high-dimensional (usually 3D) images using a 2D sensor (detector) in a single snapshot. Though enjoying the advantages of low-bandwidth, low-power and low-cost, applying SCI to large-scale problems (HD or UHD videos) in our daily life is still challenging. The bottleneck lies in the reconstruction algorithms; they are either too slow (iterativ… ▽ More Snapshot compressive imaging (SCI) aims to capture the high-dimensional (usually 3D) images using a 2D sensor (detector) in a single snapshot. Though enjoying the advantages of low-bandwidth, low-power and low-cost, applying SCI to large-scale problems (HD or UHD videos) in our daily life is still challenging. The bottleneck lies in the reconstruction algorithms; they are either too slow (iterative optimization algorithms) or not flexible to the encoding process (deep learning based end-to-end networks). In this paper, we develop fast and flexible algorithms for SCI based on the plug-and-play (PnP) framework. In addition to the widely used PnP-ADMM method, we further propose the PnP-GAP (generalized alternating projection) algorithm with a lower computational workload and prove the convergence of PnP-GAP under the SCI hardware constraints. By employing deep denoising priors, we first time show that PnP can recover a UHD color video ($3840\times 1644\times 48$ with PNSR above 30dB) from a snapshot 2D measurement. Extensive results on both simulation and real datasets verify the superiority of our proposed algorithm. The code is available at https://github.com/liuyang12/PnP-SCI. △ Less

Submitted 17 July, 2020; v1 submitted 30 March, 2020; originally announced March 2020.

Comments: CVPR 2020. Corrected a proof of convergence in previous version

arXiv:1811.03455 [pdf, other]

High fidelity single-pixel imaging

Authors: Chao Deng, Xuemei Hu, Xiaoxu Li, Jinli Suo, Zhili Zhang, Qionghai Dai

Abstract: Single-pixel imaging (SPI) is an emerging technique which has attracts wide attention in various research fields. However, restricted by the low reconstruction quality and large amount of measurements, the practical application is still in its infancy. Inspired by the fact that natural scenes exhibit unique degenerate structures in the low dimensional subspace, we propose to take advantage of the… ▽ More Single-pixel imaging (SPI) is an emerging technique which has attracts wide attention in various research fields. However, restricted by the low reconstruction quality and large amount of measurements, the practical application is still in its infancy. Inspired by the fact that natural scenes exhibit unique degenerate structures in the low dimensional subspace, we propose to take advantage of the local prior in convolutional sparse coding to implement high fidelity single-pixel imaging. Specifically, by statistically learning strategy, the target scene can be sparse represented on an overcomplete dictionary. The dictionary is composed of various basis learned from a natural image database. We introduce the above local prior into conventional SPI framework to promote the final reconstruction quality. Experiments both on synthetic data and real captured data demonstrate that our method can achieve better reconstruction from the same measurements, and thus consequently reduce the number of required measurements for same reconstruction quality. △ Less

Submitted 7 November, 2018; originally announced November 2018.

Comments: 5 pages, 6 figures

Showing 1–39 of 39 results for author: Dai, Q