-
Self-Supervised Elimination of Non-Independent Noise in Hyperspectral Imaging
Authors:
Guangrui Ding,
Chang Liu,
Jiaze Yin,
Xinyan Teng,
Yuying Tan,
Hongjian He,
Haonan Lin,
Lei Tian,
Ji-Xin Cheng
Abstract:
Hyperspectral imaging has been widely used for spectral and spatial identification of target molecules, yet often contaminated by sophisticated noise. Current denoising methods generally rely on independent and identically distributed noise statistics, showing corrupted performance for non-independent noise removal. Here, we demonstrate Self-supervised PErmutation Noise2noise Denoising (SPEND), a…
▽ More
Hyperspectral imaging has been widely used for spectral and spatial identification of target molecules, yet often contaminated by sophisticated noise. Current denoising methods generally rely on independent and identically distributed noise statistics, showing corrupted performance for non-independent noise removal. Here, we demonstrate Self-supervised PErmutation Noise2noise Denoising (SPEND), a deep learning denoising architecture tailor-made for removing non-independent noise from a single hyperspectral image stack. We utilize hyperspectral stimulated Raman scattering and mid-infrared photothermal microscopy as the testbeds, where the noise is spatially correlated and spectrally varied. Based on single hyperspectral images, SPEND permutates odd and even spectral frames to generate two stacks with identical noise properties, and uses the pairs for efficient self-supervised noise-to-noise training. SPEND achieved an 8-fold signal-to-noise improvement without having access to the ground truth data. SPEND enabled accurate mapping of low concentration biomolecules in both fingerprint and silent regions, demonstrating its robustness in sophisticated cellular environments.
△ Less
Submitted 15 September, 2024;
originally announced September 2024.
-
Spectrum Prediction With Deep 3D Pyramid Vision Transformer Learning
Authors:
Guangliang Pan,
Qihui Wu,
Bo Zhou,
Jie Li,
Wei Wang,
Guoru Ding,
David K. Y. Yau
Abstract:
In this paper, we propose a deep learning (DL)-based task-driven spectrum prediction framework, named DeepSPred. The DeepSPred comprises a feature encoder and a task predictor, where the encoder extracts spectrum usage pattern features, and the predictor configures different networks according to the task requirements to predict future spectrum. Based on the Deep- SPred, we first propose a novel 3…
▽ More
In this paper, we propose a deep learning (DL)-based task-driven spectrum prediction framework, named DeepSPred. The DeepSPred comprises a feature encoder and a task predictor, where the encoder extracts spectrum usage pattern features, and the predictor configures different networks according to the task requirements to predict future spectrum. Based on the Deep- SPred, we first propose a novel 3D spectrum prediction method combining a flow processing strategy with 3D vision Transformer (ViT, i.e., Swin) and a pyramid to serve possible applications such as spectrum monitoring task, named 3D-SwinSTB. 3D-SwinSTB unique 3D Patch Merging ViT-to-3D ViT Patch Expanding and pyramid designs help the model accurately learn the potential correlation of the evolution of the spectrogram over time. Then, we propose a novel spectrum occupancy rate (SOR) method by redesigning a predictor consisting exclusively of 3D convolutional and linear layers to serve possible applications such as dynamic spectrum access (DSA) task, named 3D-SwinLinear. Unlike the 3D-SwinSTB output spectrogram, 3D-SwinLinear projects the spectrogram directly as the SOR. Finally, we employ transfer learning (TL) to ensure the applicability of our two methods to diverse spectrum services. The results show that our 3D-SwinSTB outperforms recent benchmarks by more than 5%, while our 3D-SwinLinear achieves a 90% accuracy, with a performance improvement exceeding 10%.
△ Less
Submitted 20 August, 2024; v1 submitted 13 August, 2024;
originally announced August 2024.
-
LiDAR Point Cloud-based Multiple Vehicle Tracking with Probabilistic Measurement-Region Association
Authors:
Guanhua Ding,
Jianan Liu,
Yuxuan Xia,
Tao Huang,
Bing Zhu,
Jinping Sun
Abstract:
Multiple extended target tracking (ETT) has gained increasing attention due to the development of high-precision LiDAR and radar sensors in automotive applications. For LiDAR point cloud-based vehicle tracking, this paper presents a probabilistic measurement-region association (PMRA) ETT model, which can describe the complex measurement distribution by partitioning the target extent into different…
▽ More
Multiple extended target tracking (ETT) has gained increasing attention due to the development of high-precision LiDAR and radar sensors in automotive applications. For LiDAR point cloud-based vehicle tracking, this paper presents a probabilistic measurement-region association (PMRA) ETT model, which can describe the complex measurement distribution by partitioning the target extent into different regions. The PMRA model overcomes the drawbacks of previous data-region association (DRA) models by eliminating the approximation error of constrained estimation and using continuous integrals to more reliably calculate the association probabilities. Furthermore, the PMRA model is integrated with the Poisson multi-Bernoulli mixture (PMBM) filter for tracking multiple vehicles. Simulation results illustrate the superior estimation accuracy of the proposed PMRA-PMBM filter in terms of both positions and extents of the vehicles comparing with PMBM filters using the gamma Gaussian inverse Wishart and DRA implementations.
△ Less
Submitted 18 May, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
Which Framework is Suitable for Online 3D Multi-Object Tracking for Autonomous Driving with Automotive 4D Imaging Radar?
Authors:
Jianan Liu,
Guanhua Ding,
Yuxuan Xia,
Jinping Sun,
Tao Huang,
Lihua Xie,
Bing Zhu
Abstract:
Online 3D multi-object tracking (MOT) has recently received significant research interests due to the expanding demand of 3D perception in advanced driver assistance systems (ADAS) and autonomous driving (AD). Among the existing 3D MOT frameworks for ADAS and AD, conventional point object tracking (POT) framework using the tracking-by-detection (TBD) strategy has been well studied and accepted for…
▽ More
Online 3D multi-object tracking (MOT) has recently received significant research interests due to the expanding demand of 3D perception in advanced driver assistance systems (ADAS) and autonomous driving (AD). Among the existing 3D MOT frameworks for ADAS and AD, conventional point object tracking (POT) framework using the tracking-by-detection (TBD) strategy has been well studied and accepted for LiDAR and 4D imaging radar point clouds. In contrast, extended object tracking (EOT), another important framework which accepts the joint-detection-and-tracking (JDT) strategy, has rarely been explored for online 3D MOT applications. This paper provides the first systematic investigation of the EOT framework for online 3D MOT in real-world ADAS and AD scenarios. Specifically, the widely accepted TBD-POT framework, the recently investigated JDT-EOT framework, and our proposed TBD-EOT framework are compared via extensive evaluations on two open source 4D imaging radar datasets: View-of-Delft and TJ4DRadSet. Experiment results demonstrate that the conventional TBD-POT framework remains preferable for online 3D MOT with high tracking performance and low computational complexity, while the proposed TBD-EOT framework has the potential to outperform it in certain situations. However, the results also show that the JDT-EOT framework encounters multiple problems and performs inadequately in evaluation scenarios. After analyzing the causes of these phenomena based on various evaluation metrics and visualizations, we provide possible guidelines to improve the performance of these MOT frameworks on real-world data. These provide the first benchmark and important insights for the future development of 4D imaging radar-based online 3D MOT algorithms.
△ Less
Submitted 25 May, 2024; v1 submitted 12 September, 2023;
originally announced September 2023.
-
Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report
Authors:
Andrey Ignatov,
Radu Timofte,
Maurizio Denna,
Abdel Younes,
Ganzorig Gankhuyag,
Jingang Huh,
Myeong Kyun Kim,
Kihwan Yoon,
Hyeon-Cheol Moon,
Seungho Lee,
Yoonsik Choe,
Jinwoo Jeong,
Sungjei Kim,
Maciej Smyl,
Tomasz Latkowski,
Pawel Kubik,
Michal Sokolski,
Yujie Ma,
Jiahao Chao,
Zhou Zhou,
Hongfan Gao,
Zhengfeng Yang,
Zhenbing Zeng,
Zhengyang Zhuge,
Chenghua Li
, et al. (71 additional authors not shown)
Abstract:
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose…
▽ More
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
△ Less
Submitted 7 November, 2022;
originally announced November 2022.
-
Abnormal Signal Recognition with Time-Frequency Spectrogram: A Deep Learning Approach
Authors:
Tingyan Kuang,
Huichao Chen,
Lu Han,
Rong He,
Wei Wang,
Guoru Ding
Abstract:
With the increasingly complex and changeable electromagnetic environment, wireless communication systems are facing jamming and abnormal signal injection, which significantly affects the normal operation of a communication system. In particular, the abnormal signals may emulate the normal signals, which makes it very challenging for abnormal signal recognition. In this paper, we propose a new abno…
▽ More
With the increasingly complex and changeable electromagnetic environment, wireless communication systems are facing jamming and abnormal signal injection, which significantly affects the normal operation of a communication system. In particular, the abnormal signals may emulate the normal signals, which makes it very challenging for abnormal signal recognition. In this paper, we propose a new abnormal signal recognition scheme, which combines time-frequency analysis with deep learning to effectively identify synthetic abnormal communication signals. Firstly, we emulate synthetic abnormal communication signals including seven jamming patterns. Then, we model an abnormal communication signals recognition system based on the communication protocol between the transmitter and the receiver. To improve the performance, we convert the original signal into the time-frequency spectrogram to develop an image classification algorithm. Simulation results demonstrate that the proposed method can effectively recognize the abnormal signals under various parameter configurations, even under low signal-to-noise ratio (SNR) and low jamming-to-signal ratio (JSR) conditions.
△ Less
Submitted 30 May, 2022;
originally announced May 2022.
-
Edge Semantic Cognitive Intelligence for 6G Networks: Novel Theoretical Models, Enabling Framework, and Typical Applications
Authors:
Peihao Dong,
Qihui Wu,
Xiaofei Zhang,
Guoru Ding
Abstract:
Edge intelligence is anticipated to underlay the pathway to connected intelligence for 6G networks, but the organic confluence of edge computing and artificial intelligence still needs to be carefully treated. To this end, this article discusses the concepts of edge intelligence from the semantic cognitive perspective. Two instructive theoretical models for edge semantic cognitive intelligence (ES…
▽ More
Edge intelligence is anticipated to underlay the pathway to connected intelligence for 6G networks, but the organic confluence of edge computing and artificial intelligence still needs to be carefully treated. To this end, this article discusses the concepts of edge intelligence from the semantic cognitive perspective. Two instructive theoretical models for edge semantic cognitive intelligence (ESCI) are first established. Afterwards, the ESCI framework orchestrating deep learning with semantic communication is discussed. Two representative applications are present to shed light on the prospect of ESCI in 6G networks. Some open problems are finally listed to elicit the future research directions of ESCI.
△ Less
Submitted 9 July, 2022; v1 submitted 24 May, 2022;
originally announced May 2022.
-
Analysis and Optimization of A Double-IRS Cooperatively Assisted System with A Quasi-Static Phase Shift Design
Authors:
Gengfa Ding,
Feng Yang,
Lianghui Ding,
Ying Cui
Abstract:
The analysis and optimization of single intelligent reflecting surface (IRS)-assisted systems have been extensively studied, whereas little is known regarding multiple-IRS-assisted systems. This paper investigates the analysis and optimization of a double-IRS cooperatively assisted downlink system, where a multi-antenna base station (BS) serves a single-antenna user with the help of two multi-elem…
▽ More
The analysis and optimization of single intelligent reflecting surface (IRS)-assisted systems have been extensively studied, whereas little is known regarding multiple-IRS-assisted systems. This paper investigates the analysis and optimization of a double-IRS cooperatively assisted downlink system, where a multi-antenna base station (BS) serves a single-antenna user with the help of two multi-element IRSs, connected by an inter-IRS channel. The channel between any two nodes is modeled with Rician fading. The BS adopts the instantaneous CSI-adaptive maximum-ratio transmission (MRT) beamformer, and the two IRSs adopt a cooperative quasi-static phase shift design. The goal is to maximize the average achievable rate, which can be reflected by the average channel power of the equivalent channel between the BS and user, at a low phase adjustment cost and computational complexity. First, we obtain tractable expressions of the average channel power of the equivalent channel in the general Rician factor, pure line of sight (LoS), and pure non-line of sight (NLoS) regimes, respectively. Then, we jointly optimize the phase shifts of the two IRSs to maximize the average channel power of the equivalent channel in these regimes. The optimization problems are challenging non-convex problems. We obtain globally optimal closed-form solutions for some cases and propose computationally efficient iterative algorithms to obtain stationary points for the other cases. Next, we compare the computational complexity for optimizing the phase shifts and the optimal average channel power of the double-IRS cooperatively assisted system with those of a counterpart single-IRS-assisted system at a large number of reflecting elements in the three regimes. Finally, we numerically demonstrate notable gains of the proposed solutions over the existing solutions at different system parameters.
△ Less
Submitted 4 June, 2022; v1 submitted 31 March, 2022;
originally announced March 2022.
-
Routing with Privacy for Drone Package Delivery Systems
Authors:
Geoffrey Ding,
Alex Berke,
Karthik Gopalakrishnan,
Kwassi H. Degue,
Hamsa Balakrishnan,
Max Z. Li
Abstract:
Unmanned aerial vehicles (UAVs), or drones, are increasingly being used to deliver goods from vendors to customers. To safely conduct these operations at scale, drones are required to broadcast position information as codified in remote identification (remote ID) regulations. However, location broadcast of package delivery drones introduces a privacy risk for customers using these delivery service…
▽ More
Unmanned aerial vehicles (UAVs), or drones, are increasingly being used to deliver goods from vendors to customers. To safely conduct these operations at scale, drones are required to broadcast position information as codified in remote identification (remote ID) regulations. However, location broadcast of package delivery drones introduces a privacy risk for customers using these delivery services: Third-party observers may leverage broadcast drone trajectories to link customers with their purchases, potentially resulting in a wide range of privacy risks. We propose a probabilistic definition of privacy risk based on the likelihood of associating a customer to a vendor given a package delivery route. Next, we quantify these risks, enabling drone operators to assess privacy risks when planning delivery routes. We then evaluate the impacts of various factors (e.g., drone capacity) on privacy and consider the trade-offs between privacy and delivery wait times. Finally, we propose heuristics for generating routes with privacy guarantees to avoid exhaustive enumeration of all possible routes and evaluate their performance on several realistic delivery scenarios.
△ Less
Submitted 29 June, 2022; v1 submitted 4 March, 2022;
originally announced March 2022.
-
Emotion Recognition from Multiple Modalities: Fundamentals and Methodologies
Authors:
Sicheng Zhao,
Guoli Jia,
Jufeng Yang,
Guiguang Ding,
Kurt Keutzer
Abstract:
Humans are emotional creatures. Multiple modalities are often involved when we express emotions, whether we do so explicitly (e.g., facial expression, speech) or implicitly (e.g., text, image). Enabling machines to have emotional intelligence, i.e., recognizing, interpreting, processing, and simulating emotions, is becoming increasingly important. In this tutorial, we discuss several key aspects o…
▽ More
Humans are emotional creatures. Multiple modalities are often involved when we express emotions, whether we do so explicitly (e.g., facial expression, speech) or implicitly (e.g., text, image). Enabling machines to have emotional intelligence, i.e., recognizing, interpreting, processing, and simulating emotions, is becoming increasingly important. In this tutorial, we discuss several key aspects of multi-modal emotion recognition (MER). We begin with a brief introduction on widely used emotion representation models and affective modalities. We then summarize existing emotion annotation strategies and corresponding computational tasks, followed by the description of main challenges in MER. Furthermore, we present some representative approaches on representation learning of each affective modality, feature fusion of different affective modalities, classifier optimization for MER, and domain adaptation for MER. Finally, we outline several real-world applications and discuss some future directions.
△ Less
Submitted 18 August, 2021;
originally announced August 2021.
-
An Edge Computing Paradigm for Massive IoT Connectivity over High-Altitude Platform Networks
Authors:
Malong Ke,
Zhen Gao,
Yang Huang,
Guoru Ding,
Derrick Wing Kwan Ng,
Qihui Wu,
Jun Zhang
Abstract:
With the advent of the Internet-of-Things (IoT) era, the ever-increasing number of devices and emerging applications have triggered the need for ubiquitous connectivity and more efficient computing paradigms. These stringent demands have posed significant challenges to the current wireless networks and their computing architectures. In this article, we propose a high-altitude platform (HAP) networ…
▽ More
With the advent of the Internet-of-Things (IoT) era, the ever-increasing number of devices and emerging applications have triggered the need for ubiquitous connectivity and more efficient computing paradigms. These stringent demands have posed significant challenges to the current wireless networks and their computing architectures. In this article, we propose a high-altitude platform (HAP) network-enabled edge computing paradigm to tackle the key issues of massive IoT connectivity. Specifically, we first provide a comprehensive overview of the recent advances in non-terrestrial network-based edge computing architectures. Then, the limitations of the existing solutions are further summarized from the perspectives of the network architecture, random access procedure, and multiple access techniques. To overcome the limitations, we propose a HAP-enabled aerial cell-free massive multiple-input multiple-output network to realize the edge computing paradigm, where multiple HAPs cooperate via the edge servers to serve IoT devices. For the case of a massive number of devices, we further adopt a grant-free massive access scheme to guarantee low-latency and high-efficiency massive IoT connectivity to the network. Besides, a case study is provided to demonstrate the effectiveness of the proposed solution. Finally, to shed light on the future research directions of HAP network-enabled edge computing paradigms, the key challenges and open issues are discussed.
△ Less
Submitted 25 June, 2021;
originally announced June 2021.
-
Automated Model Design and Benchmarking of 3D Deep Learning Models for COVID-19 Detection with Chest CT Scans
Authors:
Xin He,
Shihao Wang,
Xiaowen Chu,
Shaohuai Shi,
Jiangping Tang,
Xin Liu,
Chenggang Yan,
Jiyong Zhang,
Guiguang Ding
Abstract:
The COVID-19 pandemic has spread globally for several months. Because its transmissibility and high pathogenicity seriously threaten people's lives, it is crucial to accurately and quickly detect COVID-19 infection. Many recent studies have shown that deep learning (DL) based solutions can help detect COVID-19 based on chest CT scans. However, most existing work focuses on 2D datasets, which may r…
▽ More
The COVID-19 pandemic has spread globally for several months. Because its transmissibility and high pathogenicity seriously threaten people's lives, it is crucial to accurately and quickly detect COVID-19 infection. Many recent studies have shown that deep learning (DL) based solutions can help detect COVID-19 based on chest CT scans. However, most existing work focuses on 2D datasets, which may result in low quality models as the real CT scans are 3D images. Besides, the reported results span a broad spectrum on different datasets with a relatively unfair comparison. In this paper, we first use three state-of-the-art 3D models (ResNet3D101, DenseNet3D121, and MC3\_18) to establish the baseline performance on the three publicly available chest CT scan datasets. Then we propose a differentiable neural architecture search (DNAS) framework to automatically search for the 3D DL models for 3D chest CT scans classification with the Gumbel Softmax technique to improve the searching efficiency. We further exploit the Class Activation Mapping (CAM) technique on our models to provide the interpretability of the results. The experimental results show that our automatically searched models (CovidNet3D) outperform the baseline human-designed models on the three datasets with tens of times smaller model size and higher accuracy. Furthermore, the results also verify that CAM can be well applied in CovidNet3D for COVID-19 datasets to provide interpretability for medical diagnosis.
△ Less
Submitted 12 February, 2021; v1 submitted 13 January, 2021;
originally announced January 2021.
-
3D Spectrum Mapping Based on ROI-Driven UAV Deployment
Authors:
Qihui Wu,
Feng Shen,
Zheng Wang,
Guoru Ding
Abstract:
Given the explosive growth of Internet of Things (IoT) devices ranging from the two-dimensional (2D) ground to the three-dimensional (3D) space, it is a necessity to establish a 3D spectrum map to comprehensively present and effectively manage the 3D spatial spectrum resources in smart city infrastructures. By leveraging the popularity and location flexibility of the unmanned aerial vehicles (UAVs…
▽ More
Given the explosive growth of Internet of Things (IoT) devices ranging from the two-dimensional (2D) ground to the three-dimensional (3D) space, it is a necessity to establish a 3D spectrum map to comprehensively present and effectively manage the 3D spatial spectrum resources in smart city infrastructures. By leveraging the popularity and location flexibility of the unmanned aerial vehicles (UAVs), we are able to execute spatial sampling with these emerging flying spectrum-monitoring devices (SMDs) at will. In this paper, we first present a brief survey to show the state-of-the-art studies on spectrum mapping. Then, we introduce the 3D spectrum mapping model. Next, we propose a 3D spectrum mapping framework which is composed of pre-sampling, spectrum situation estimation, UAV deployment and spectrum recovery. Therein we develop a Region of Interest (ROI)-driven UAV deployment scheme, which selects new sampling points of the highest estimated interest and the lowest energy cost iteratively. Meanwhile, we slice the entire 3D spectrum map into a series of "images" and "repair" those unsampled locations. Furthermore, we provide an exemplary case study on the 3D spectrum mapping, where, for example, an important event is being held and the entire spectrum situation needs to be monitored in real time to deal with malicious interference sources. Lastly, the challenges and open issues are discussed.
△ Less
Submitted 6 August, 2020;
originally announced August 2020.
-
Neural Kalman Filtering for Speech Enhancement
Authors:
Wei Xue,
Gang Quan,
Chao Zhang,
Guohong Ding,
Xiaodong He,
Bowen Zhou
Abstract:
Statistical signal processing based speech enhancement methods adopt expert knowledge to design the statistical models and linear filters, which is complementary to the deep neural network (DNN) based methods which are data-driven. In this paper, by using expert knowledge from statistical signal processing for network design and optimization, we extend the conventional Kalman filtering (KF) to the…
▽ More
Statistical signal processing based speech enhancement methods adopt expert knowledge to design the statistical models and linear filters, which is complementary to the deep neural network (DNN) based methods which are data-driven. In this paper, by using expert knowledge from statistical signal processing for network design and optimization, we extend the conventional Kalman filtering (KF) to the supervised learning scheme, and propose the neural Kalman filtering (NKF) for speech enhancement. Two intermediate clean speech estimates are first produced from recurrent neural networks (RNN) and linear Wiener filtering (WF) separately and are then linearly combined by a learned NKF gain to yield the NKF output. Supervised joint training is applied to NKF to learn to automatically trade-off between the instantaneous linear estimation made by the WF and the long-term non-linear estimation made by the RNN. The NKF method can be seen as using expert knowledge from WF to regularize the RNN estimations to improve its generalization ability to the noise conditions unseen in the training. Experiments in different noisy conditions show that the proposed method outperforms the baseline methods both in terms of objective evaluation metrics and automatic speech recognition (ASR) word error rates (WERs).
△ Less
Submitted 16 April, 2021; v1 submitted 27 July, 2020;
originally announced July 2020.
-
ResRep: Lossless CNN Pruning via Decoupling Remembering and Forgetting
Authors:
Xiaohan Ding,
Tianxiang Hao,
Jianchao Tan,
Ji Liu,
Jungong Han,
Yuchen Guo,
Guiguang Ding
Abstract:
We propose ResRep, a novel method for lossless channel pruning (a.k.a. filter pruning), which slims down a CNN by reducing the width (number of output channels) of convolutional layers. Inspired by the neurobiology research about the independence of remembering and forgetting, we propose to re-parameterize a CNN into the remembering parts and forgetting parts, where the former learn to maintain th…
▽ More
We propose ResRep, a novel method for lossless channel pruning (a.k.a. filter pruning), which slims down a CNN by reducing the width (number of output channels) of convolutional layers. Inspired by the neurobiology research about the independence of remembering and forgetting, we propose to re-parameterize a CNN into the remembering parts and forgetting parts, where the former learn to maintain the performance and the latter learn to prune. Via training with regular SGD on the former but a novel update rule with penalty gradients on the latter, we realize structured sparsity. Then we equivalently merge the remembering and forgetting parts into the original architecture with narrower layers. In this sense, ResRep can be viewed as a successful application of Structural Re-parameterization. Such a methodology distinguishes ResRep from the traditional learning-based pruning paradigm that applies a penalty on parameters to produce sparsity, which may suppress the parameters essential for the remembering. ResRep slims down a standard ResNet-50 with 76.15% accuracy on ImageNet to a narrower one with only 45% FLOPs and no accuracy drop, which is the first to achieve lossless pruning with such a high compression ratio. The code and models are at https://github.com/DingXiaoH/ResRep.
△ Less
Submitted 14 August, 2021; v1 submitted 7 July, 2020;
originally announced July 2020.
-
0.71-Å resolution electron tomography enabled by deep learning aided information recovery
Authors:
Chunyang Wang,
Guanglei Ding,
Yitong Liu,
Huolin L. Xin
Abstract:
Electron tomography, as an important 3D imaging method, offers a powerful method to probe the 3D structure of materials from the nano- to the atomic-scale. However, as a grant challenge, radiation intolerance of the nanoscale samples and the missing-wedge-induced information loss and artifacts greatly hindered us from obtaining 3D atomic structures with high fidelity. Here, for the first time, by…
▽ More
Electron tomography, as an important 3D imaging method, offers a powerful method to probe the 3D structure of materials from the nano- to the atomic-scale. However, as a grant challenge, radiation intolerance of the nanoscale samples and the missing-wedge-induced information loss and artifacts greatly hindered us from obtaining 3D atomic structures with high fidelity. Here, for the first time, by combining generative adversarial models with state-of-the-art network architectures, we demonstrate the resolution of electron tomography can be improved to 0.71 angstrom which is the highest three-dimensional imaging resolution that has been reported thus far. We also show it is possible to recover the lost information and remove artifacts in the reconstructed tomograms by only acquiring data from -50 to +50 degrees (44% reduction of dosage compared to -90 to +90 degrees full tilt series). In contrast to conventional methods, the deep learning model shows outstanding performance for both macroscopic objects and atomic features solving the long-standing dosage and missing-wedge problems in electron tomography. Our work provides important guidance for the application of machine learning methods to tomographic imaging and sheds light on its applications in other 3D imaging techniques.
△ Less
Submitted 27 March, 2020;
originally announced March 2020.
-
Cascaded Deep Neural Networks for Retinal Layer Segmentation of Optical Coherence Tomography with Fluid Presence
Authors:
Donghuan Lu,
Morgan Heisler,
Da Ma,
Setareh Dabiri,
Sieun Lee,
Gavin Weiguang Ding,
Marinko V. Sarunic,
Mirza Faisal Beg
Abstract:
Optical coherence tomography (OCT) is a non-invasive imaging technology which can provide micrometer-resolution cross-sectional images of the inner structures of the eye. It is widely used for the diagnosis of ophthalmic diseases with retinal alteration, such as layer deformation and fluid accumulation. In this paper, a novel framework was proposed to segment retinal layers with fluid presence. Th…
▽ More
Optical coherence tomography (OCT) is a non-invasive imaging technology which can provide micrometer-resolution cross-sectional images of the inner structures of the eye. It is widely used for the diagnosis of ophthalmic diseases with retinal alteration, such as layer deformation and fluid accumulation. In this paper, a novel framework was proposed to segment retinal layers with fluid presence. The main contribution of this study is two folds: 1) we developed a cascaded network framework to incorporate the prior structural knowledge; 2) we proposed a novel deep neural network based on U-Net and fully convolutional network, termed LF-UNet. Cross validation experiments proved that the proposed LF-UNet has superior performance comparing with the state-of-the-art methods, and incorporating the relative distance map structural prior information could further improve the performance regardless the network.
△ Less
Submitted 6 December, 2019;
originally announced December 2019.
-
Towards Scalable Koopman Operator Learning: Convergence Rates and A Distributed Learning Algorithm
Authors:
Zhiyuan Liu,
Guohui Ding,
Lijun Chen,
Enoch Yeung
Abstract:
We propose an alternating optimization algorithm to the nonconvex Koopman operator learning problem for nonlinear dynamic systems. We show that the proposed algorithm will converge to a critical point with rate $O(1/T)$ and $O(\frac{1}{\log T})$ for the constant and diminishing learning rates, respectively, under some mild conditions. To cope with the high dimensional nonlinear dynamical systems,…
▽ More
We propose an alternating optimization algorithm to the nonconvex Koopman operator learning problem for nonlinear dynamic systems. We show that the proposed algorithm will converge to a critical point with rate $O(1/T)$ and $O(\frac{1}{\log T})$ for the constant and diminishing learning rates, respectively, under some mild conditions. To cope with the high dimensional nonlinear dynamical systems, we present the first-ever distributed Koopman operator learning algorithm. We show that the distributed Koopman operator learning has the same convergence properties as the centralized Koopman operator learning, in the absence of optimal tracker, so long as the basis functions satisfy a set of state-based decomposition conditions. Numerical experiments are provided to complement our theoretical results.
△ Less
Submitted 20 March, 2020; v1 submitted 30 September, 2019;
originally announced September 2019.
-
Accelerating DNN Training in Wireless Federated Edge Learning Systems
Authors:
Jinke Ren,
Guanding Yu,
Guangyao Ding
Abstract:
Training task in classical machine learning models, such as deep neural networks, is generally implemented at a remote cloud center for centralized learning, which is typically time-consuming and resource-hungry. It also incurs serious privacy issue and long communication latency since a large amount of data are transmitted to the centralized node. To overcome these shortcomings, we consider a new…
▽ More
Training task in classical machine learning models, such as deep neural networks, is generally implemented at a remote cloud center for centralized learning, which is typically time-consuming and resource-hungry. It also incurs serious privacy issue and long communication latency since a large amount of data are transmitted to the centralized node. To overcome these shortcomings, we consider a newly-emerged framework, namely federated edge learning, to aggregate local learning updates at the network edge in lieu of users' raw data. Aiming at accelerating the training process, we first define a novel performance evaluation criterion, called learning efficiency. We then formulate a training acceleration optimization problem in the CPU scenario, where each user device is equipped with CPU. The closed-form expressions for joint batchsize selection and communication resource allocation are developed and some insightful results are highlighted. Further, we extend our learning framework to the GPU scenario. The optimal solution in this scenario is manifested to have the similar structure as that of the CPU scenario, recommending that our proposed algorithm is applicable in more general systems. Finally, extensive experiments validate the theoretical analysis and demonstrate that the proposed algorithm can reduce the training time and improve the learning accuracy simultaneously.
△ Less
Submitted 25 October, 2020; v1 submitted 23 May, 2019;
originally announced May 2019.
-
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences
Authors:
Kong Aik Lee,
Ville Hautamaki,
Tomi Kinnunen,
Hitoshi Yamamoto,
Koji Okabe,
Ville Vestman,
Jing Huang,
Guohong Ding,
Hanwu Sun,
Anthony Larcher,
Rohan Kumar Das,
Haizhou Li,
Mickael Rouvier,
Pierre-Michel Bousquet,
Wei Rao,
Qing Wang,
Chunlei Zhang,
Fahimeh Bahmaninezhad,
Hector Delgado,
Jose Patino,
Qiongqiong Wang,
Ling Guo,
Takafumi Koshinaka,
Jiacen Zhang,
Koichi Shinoda
, et al. (21 additional authors not shown)
Abstract:
The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the res…
▽ More
The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the results and lessons learned based on the twelve sub-systems and their fusion submitted to SRE'18. It is also our intention to present a shared view on the advancements, progresses, and major paradigm shifts that we have witnessed as an SRE participant in the past decade from SRE'08 to SRE'18. In this regard, we have seen, among others, a paradigm shift from supervector representation to deep speaker embedding, and a switch of research challenge from channel compensation to domain adaptation.
△ Less
Submitted 15 April, 2019;
originally announced April 2019.
-
Deep Learning for Signal Demodulation in Physical Layer Wireless Communications: Prototype Platform, Open Dataset, and Analytics
Authors:
Hongmei Wang,
Zhenzhen Wu,
Shuai Ma,
Songtao Lu,
Han Zhang,
Guoru Ding,
Shiyin Li
Abstract:
In this paper, we investigate deep learning (DL)-enabled signal demodulation methods and establish the first open dataset of real modulated signals for wireless communication systems. Specifically, we propose a flexible communication prototype platform for measuring real modulation dataset. Then, based on the measured dataset, two DL-based demodulators, called deep belief network (DBN)-support vec…
▽ More
In this paper, we investigate deep learning (DL)-enabled signal demodulation methods and establish the first open dataset of real modulated signals for wireless communication systems. Specifically, we propose a flexible communication prototype platform for measuring real modulation dataset. Then, based on the measured dataset, two DL-based demodulators, called deep belief network (DBN)-support vector machine (SVM) demodulator and adaptive boosting (AdaBoost) based demodulator, are proposed. The proposed DBN-SVM based demodulator exploits the advantages of both DBN and SVM, i.e., the advantage of DBN as a feature extractor and SVM as a feature classifier. In DBN-SVM based demodulator, the received signals are normalized before being fed to the DBN network. Furthermore, an AdaBoost based demodulator is developed, which employs the $k$-Nearest Neighbor (KNN) as a weak classifier to form a strong combined classifier. Finally, experimental results indicate that the proposed DBN-SVM based demodulator and AdaBoost based demodulator are superior to the single classification method using DBN, SVM, and maximum likelihood (MLD) based demodulator.
△ Less
Submitted 8 March, 2019;
originally announced March 2019.
-
Deep Speaker Embedding Learning with Multi-Level Pooling for Text-Independent Speaker Verification
Authors:
Yun Tang,
Guohong Ding,
Jing Huang,
Xiaodong He,
Bowen Zhou
Abstract:
This paper aims to improve the widely used deep speaker embedding x-vector model. We propose the following improvements: (1) a hybrid neural network structure using both time delay neural network (TDNN) and long short-term memory neural networks (LSTM) to generate complementary speaker information at different levels; (2) a multi-level pooling strategy to collect speaker information from both TDNN…
▽ More
This paper aims to improve the widely used deep speaker embedding x-vector model. We propose the following improvements: (1) a hybrid neural network structure using both time delay neural network (TDNN) and long short-term memory neural networks (LSTM) to generate complementary speaker information at different levels; (2) a multi-level pooling strategy to collect speaker information from both TDNN and LSTM layers; (3) a regularization scheme on the speaker embedding extraction layer to make the extracted embeddings suitable for the following fusion step. The synergy of these improvements are shown on the NIST SRE 2016 eval test (with a 19% EER reduction) and SRE 2018 dev test (with a 9% EER reduction), as well as more than 10% DCF scores reduction on these two test sets over the x-vector baseline.
△ Less
Submitted 20 February, 2019;
originally announced February 2019.
-
Network-Connected UAV Communications: Potentials and Challenges
Authors:
Haichao Wang,
Jinlong Wang,
Jin Chen,
Yuping Gong,
Guoru Ding
Abstract:
This article explores the use of network-connected unmanned aerial vehicle (UAV) communications as a compelling solution to achieve high-rate information transmission and support ultra-reliable UAV remote command and control. We first discuss the use cases of UAVs and the resulting communication requirements, accompanied with a flexible architecture for network-connected UAV communications. Then,…
▽ More
This article explores the use of network-connected unmanned aerial vehicle (UAV) communications as a compelling solution to achieve high-rate information transmission and support ultra-reliable UAV remote command and control. We first discuss the use cases of UAVs and the resulting communication requirements, accompanied with a flexible architecture for network-connected UAV communications. Then, the signal transmission and interference characteristics are theoretically analyzed, and subsequently we highlight the design and optimization considerations, including antenna design, non-orthogonal multiple access communications, as well as network selection and association optimization. Finally, case studies are provided to show the feasibility of network-connected UAV communications.
△ Less
Submitted 29 May, 2018;
originally announced June 2018.
-
D2D Communications Underlaying Wireless Powered Communication Networks
Authors:
Haichao Wang,
Jinlong Wang,
Guoru Ding,
Zhu Han
Abstract:
In this paper, we investigate the resource allocation problem for D2D communications underlaying wireless powered communication networks, where multiple D2D pairs harvest energy from a power station equipped with multiple antennas and then transmit information signals simultaneously over the same spectrum resource. The aim is to maximize the sum throughput via joint time scheduling and power contr…
▽ More
In this paper, we investigate the resource allocation problem for D2D communications underlaying wireless powered communication networks, where multiple D2D pairs harvest energy from a power station equipped with multiple antennas and then transmit information signals simultaneously over the same spectrum resource. The aim is to maximize the sum throughput via joint time scheduling and power control, while satisfying the energy causality constraints. The formulated non-convex problem is first transformed into a nonlinear fractional programming problem with a tactful reformulation. Then, by leveraging D.C. (difference of two convex functions) programming, a suboptimal solution to the non-convex problem is obtained by iteratively solving a sequence of convex problems. Simulation results demonstrate that the proposed scheme works well in different scenarios and can significantly improve the system throughput compared with the-state-of-the-art schemes.
△ Less
Submitted 29 May, 2018;
originally announced June 2018.
-
Unmanned Aerial Vehicle-Aided Communications: Joint Transmit Power and Trajectory Optimization
Authors:
Haichao Wang,
Guochun Ren,
Jin Chen,
Guoru Ding,
Yijun Yang
Abstract:
This letter investigates the transmit power and trajectory optimization problem for unmanned aerial vehicle (UAV)-aided networks. Different from majority of the existing studies with fixed communication infrastructure, a dynamic scenario is considered where a flying UAV provides wireless services for multiple ground nodes simultaneously. To fully exploit the controllable channel variations provide…
▽ More
This letter investigates the transmit power and trajectory optimization problem for unmanned aerial vehicle (UAV)-aided networks. Different from majority of the existing studies with fixed communication infrastructure, a dynamic scenario is considered where a flying UAV provides wireless services for multiple ground nodes simultaneously. To fully exploit the controllable channel variations provided by the UAV's mobility, the UAV's transmit power and trajectory are jointly optimized to maximize the minimum average throughput within a given time length. For the formulated non-convex optimization with power budget and trajectory constraints, this letter presents an efficient joint transmit power and trajectory optimization algorithm. Simulation results validate the effectiveness of the proposed algorithm and reveal that the optimized transmit power shows a water-filling characteristic in spatial domain.
△ Less
Submitted 14 January, 2018;
originally announced January 2018.
-
Spectrum Sensing under Spectrum Misuse Behaviors: A Multi-Hypothesis Test Perspective
Authors:
Linyuan Zhang,
Guoru Ding,
Qihui Wu,
Zhu Han
Abstract:
Spectrum misuse behaviors, brought either by illegitimate access or by rogue power emission, endanger the legitimate communication and deteriorate the spectrum usage environment. In this paper, our aim is to detect whether the spectrum band is occupied, and if it is occupied, recognize whether the misuse behavior exists. One vital challenge is that the legitimate spectrum exploitation and misuse b…
▽ More
Spectrum misuse behaviors, brought either by illegitimate access or by rogue power emission, endanger the legitimate communication and deteriorate the spectrum usage environment. In this paper, our aim is to detect whether the spectrum band is occupied, and if it is occupied, recognize whether the misuse behavior exists. One vital challenge is that the legitimate spectrum exploitation and misuse behaviors coexist and the illegitimate user may act in an intermittent and fast-changing manner, which brings about much uncertainty for spectrum sensing. To tackle it, we firstly formulate the spectrum sensing problems under illegitimate access and rogue power emission as a uniform ternary hypothesis test. Then, we develop a novel test criterion, named the generalized multi-hypothesis N-P criterion. Following the criterion, we derive two test rules based on the generalized likelihood ratio test and the R-test, respectively, whose asymptotic performances are analyzed and an upper bound is also given. Furthermore, a cooperative spectrum sensing scheme is designed based on the global N-P criterion to further improve the detection performances. In addition, extensive simulations are provided to verify the proposed schemes' performance under various parameter configurations.
△ Less
Submitted 29 November, 2017;
originally announced November 2017.