-
MambaIC: State Space Models for High-Performance Learned Image Compression
Authors:
Fanhu Zeng,
Hao Tang,
Yihua Shao,
Siyu Chen,
Ling Shao,
Yan Wang
Abstract:
A high-performance image compression algorithm is crucial for real-time information transmission across numerous fields. Despite rapid progress in image compression, computational inefficiency and poor redundancy modeling still pose significant bottlenecks, limiting practical applications. Inspired by the effectiveness of state space models (SSMs) in capturing long-range dependencies, we leverage…
▽ More
A high-performance image compression algorithm is crucial for real-time information transmission across numerous fields. Despite rapid progress in image compression, computational inefficiency and poor redundancy modeling still pose significant bottlenecks, limiting practical applications. Inspired by the effectiveness of state space models (SSMs) in capturing long-range dependencies, we leverage SSMs to address computational inefficiency in existing methods and improve image compression from multiple perspectives. In this paper, we integrate the advantages of SSMs for better efficiency-performance trade-off and propose an enhanced image compression approach through refined context modeling, which we term MambaIC. Specifically, we explore context modeling to adaptively refine the representation of hidden states. Additionally, we introduce window-based local attention into channel-spatial entropy modeling to reduce potential spatial redundancy during compression, thereby increasing efficiency. Comprehensive qualitative and quantitative results validate the effectiveness and efficiency of our approach, particularly for high-resolution image compression. Code is released at https://github.com/AuroraZengfh/MambaIC.
△ Less
Submitted 19 March, 2025; v1 submitted 16 March, 2025;
originally announced March 2025.
-
Adaptive Wall-Following Control for Unmanned Ground Vehicles Using Spiking Neural Networks
Authors:
Hengye Yang,
Yanxiao Chen,
Zexuan Fan,
Lin Shao,
Tao Sun
Abstract:
Unmanned ground vehicles operating in complex environments must adaptively adjust to modeling uncertainties and external disturbances to perform tasks such as wall following and obstacle avoidance. This paper introduces an adaptive control approach based on spiking neural networks for wall fitting and tracking, which learns and adapts to unforeseen disturbances. We propose real-time wall-fitting a…
▽ More
Unmanned ground vehicles operating in complex environments must adaptively adjust to modeling uncertainties and external disturbances to perform tasks such as wall following and obstacle avoidance. This paper introduces an adaptive control approach based on spiking neural networks for wall fitting and tracking, which learns and adapts to unforeseen disturbances. We propose real-time wall-fitting algorithms to model unknown wall shapes and generate corresponding trajectories for the vehicle to follow. A discretized linear quadratic regulator is developed to provide a baseline control signal based on an ideal vehicle model. Point matching algorithms then identify the nearest matching point on the trajectory to generate feedforward control inputs. Finally, an adaptive spiking neural network controller, which adjusts its connection weights online based on error signals, is integrated with the aforementioned control algorithms. Numerical simulations demonstrate that this adaptive control framework outperforms the traditional linear quadratic regulator in tracking complex trajectories and following irregular walls, even in the presence of partial actuator failures and state estimation errors.
△ Less
Submitted 1 March, 2025;
originally announced March 2025.
-
Versatile Cataract Fundus Image Restoration Model Utilizing Unpaired Cataract and High-quality Images
Authors:
Zheng Gong,
Zhuo Deng,
Weihao Gao,
Wenda Zhou,
Yuhang Yang,
Hanqing Zhao,
Zhiyuan Niu,
Lei Shao,
Wenbin Wei,
Lan Ma
Abstract:
Cataract is one of the most common blinding eye diseases and can be treated by surgery. However, because cataract patients may also suffer from other blinding eye diseases, ophthalmologists must diagnose them before surgery. The cloudy lens of cataract patients forms a hazy degeneration in the fundus images, making it challenging to observe the patient's fundus vessels, which brings difficulties t…
▽ More
Cataract is one of the most common blinding eye diseases and can be treated by surgery. However, because cataract patients may also suffer from other blinding eye diseases, ophthalmologists must diagnose them before surgery. The cloudy lens of cataract patients forms a hazy degeneration in the fundus images, making it challenging to observe the patient's fundus vessels, which brings difficulties to the diagnosis process. To address this issue, this paper establishes a new cataract image restoration method named Catintell. It contains a cataract image synthesizing model, Catintell-Syn, and a restoration model, Catintell-Res. Catintell-Syn uses GAN architecture with fully unsupervised data to generate paired cataract-like images with realistic style and texture rather than the conventional Gaussian degradation algorithm. Meanwhile, Catintell-Res is an image restoration network that can improve the quality of real cataract fundus images using the knowledge learned from synthetic cataract images. Extensive experiments show that Catintell-Res outperforms other cataract image restoration methods in PSNR with 39.03 and SSIM with 0.9476. Furthermore, the universal restoration ability that Catintell-Res gained from unpaired cataract images can process cataract images from various datasets. We hope the models can help ophthalmologists identify other blinding eye diseases of cataract patients and inspire more medical image restoration methods in the future.
△ Less
Submitted 19 November, 2024;
originally announced November 2024.
-
Double-Shot 3D Shape Measurement with a Dual-Branch Network for Structured Light Projection Profilometry
Authors:
Mingyang Lei,
Jingfan Fan,
Long Shao,
Hong Song,
Deqiang Xiao,
Danni Ai,
Tianyu Fu,
Ying Gu,
Jian Yang
Abstract:
The structured light (SL)-based three-dimensional (3D) measurement techniques with deep learning have been widely studied to improve measurement efficiency, among which fringe projection profilometry (FPP) and speckle projection profilometry (SPP) are two popular methods. However, they generally use a single projection pattern for reconstruction, resulting in fringe order ambiguity or poor reconst…
▽ More
The structured light (SL)-based three-dimensional (3D) measurement techniques with deep learning have been widely studied to improve measurement efficiency, among which fringe projection profilometry (FPP) and speckle projection profilometry (SPP) are two popular methods. However, they generally use a single projection pattern for reconstruction, resulting in fringe order ambiguity or poor reconstruction accuracy. To alleviate these problems, we propose a parallel dual-branch Convolutional Neural Network (CNN)-Transformer network (PDCNet), to take advantage of convolutional operations and self-attention mechanisms for processing different SL modalities. Within PDCNet, a Transformer branch is used to capture global perception in the fringe images, while a CNN branch is designed to collect local details in the speckle images. To fully integrate complementary features, we design a double-stream attention aggregation module (DAAM) that consists of a parallel attention subnetwork for aggregating multi-scale spatial structure information. This module can dynamically retain local and global representations to the maximum extent. Moreover, an adaptive mixture density head with bimodal Gaussian distribution is proposed for learning a representation that is precise near discontinuities. Compared to the standard disparity regression strategy, this adaptive mixture head can effectively improve performance at object boundaries. Extensive experiments demonstrate that our method can reduce fringe order ambiguity while producing high-accuracy results on self-made datasets.
△ Less
Submitted 9 December, 2024; v1 submitted 19 July, 2024;
originally announced July 2024.
-
Hierarchical Decoupling Capacitor Optimization for Power Distribution Network of 2.5D ICs with Co-Analysis of Frequency and Time Domains Based on Deep Reinforcement Learning
Authors:
Yuanyuan Duan,
Haiyang Feng,
Zhiping Yu,
Hanming Wu,
Leilai Shao,
Xiaolei Zhu
Abstract:
With the growing need for higher memory bandwidth and computation density, 2.5D design, which involves integrating multiple chiplets onto an interposer, emerges as a promising solution. However, this integration introduces significant challenges due to increasing data rates and a large number of I/Os, necessitating advanced optimization of the power distribution networks (PDNs) both on-chip and on…
▽ More
With the growing need for higher memory bandwidth and computation density, 2.5D design, which involves integrating multiple chiplets onto an interposer, emerges as a promising solution. However, this integration introduces significant challenges due to increasing data rates and a large number of I/Os, necessitating advanced optimization of the power distribution networks (PDNs) both on-chip and on-interposer to mitigate the small signal noise and simultaneous switching noise (SSN). Traditional PDN optimization strategies in 2.5D systems primarily focus on reducing impedance by integrating decoupling capacitors (decaps) to lessen small signal noises. Unfortunately, relying solely on frequency-domain analysis has been proven inadequate for addressing coupled SSN, as indicated by our experimental results. In this work, we introduce a novel two-phase optimization flow using deep reinforcement learning to tackle both the on-chip small signal noise and SSN. Initially, we optimize the impedance in the frequency domain to maintain the small signal noise within acceptable limits while avoiding over-design. Subsequently, in the time domain, we refine the PDN to minimize the voltage violation integral (VVI), a more accurate measure of SSN severity. To the best of our knowledge, this is the first dual-domain optimization strategy that simultaneously addresses both the small signal noise and SSN propagation through strategic decap placement in on-chip and on-interposer PDNs, offering a significant step forward in the design of robust PDNs for 2.5D integrated systems.
△ Less
Submitted 20 May, 2025; v1 submitted 2 July, 2024;
originally announced July 2024.
-
TieBot: Learning to Knot a Tie from Visual Demonstration through a Real-to-Sim-to-Real Approach
Authors:
Weikun Peng,
Jun Lv,
Yuwei Zeng,
Haonan Chen,
Siheng Zhao,
Jichen Sun,
Cewu Lu,
Lin Shao
Abstract:
The tie-knotting task is highly challenging due to the tie's high deformation and long-horizon manipulation actions. This work presents TieBot, a Real-to-Sim-to-Real learning from visual demonstration system for the robots to learn to knot a tie. We introduce the Hierarchical Feature Matching approach to estimate a sequence of tie's meshes from the demonstration video. With these estimated meshes…
▽ More
The tie-knotting task is highly challenging due to the tie's high deformation and long-horizon manipulation actions. This work presents TieBot, a Real-to-Sim-to-Real learning from visual demonstration system for the robots to learn to knot a tie. We introduce the Hierarchical Feature Matching approach to estimate a sequence of tie's meshes from the demonstration video. With these estimated meshes used as subgoals, we first learn a teacher policy using privileged information. Then, we learn a student policy with point cloud observation by imitating teacher policy. Lastly, our pipeline applies learned policy to real-world execution. We demonstrate the effectiveness of TieBot in simulation and the real world. In the real-world experiment, a dual-arm robot successfully knots a tie, achieving 50% success rate among 10 trials. Videos can be found https://tiebots.github.io/.
△ Less
Submitted 19 October, 2024; v1 submitted 3 July, 2024;
originally announced July 2024.
-
UniAV: Unified Audio-Visual Perception for Multi-Task Video Event Localization
Authors:
Tiantian Geng,
Teng Wang,
Yanfu Zhang,
Jinming Duan,
Weili Guan,
Feng Zheng,
Ling shao
Abstract:
Video localization tasks aim to temporally locate specific instances in videos, including temporal action localization (TAL), sound event detection (SED) and audio-visual event localization (AVEL). Existing methods over-specialize on each task, overlooking the fact that these instances often occur in the same video to form the complete video content. In this work, we present UniAV, a Unified Audio…
▽ More
Video localization tasks aim to temporally locate specific instances in videos, including temporal action localization (TAL), sound event detection (SED) and audio-visual event localization (AVEL). Existing methods over-specialize on each task, overlooking the fact that these instances often occur in the same video to form the complete video content. In this work, we present UniAV, a Unified Audio-Visual perception network, to achieve joint learning of TAL, SED and AVEL tasks for the first time. UniAV can leverage diverse data available in task-specific datasets, allowing the model to learn and share mutually beneficial knowledge across tasks and modalities. To tackle the challenges posed by substantial variations in datasets (size/domain/duration) and distinct task characteristics, we propose to uniformly encode visual and audio modalities of all videos to derive generic representations, while also designing task-specific experts to capture unique knowledge for each task. Besides, we develop a unified language-aware classifier by utilizing a pre-trained text encoder, enabling the model to flexibly detect various types of instances and previously unseen ones by simply changing prompts during inference. UniAV outperforms its single-task counterparts by a large margin with fewer parameters, achieving on-par or superior performances compared to state-of-the-art task-specific methods across ActivityNet 1.3, DESED and UnAV-100 benchmarks.
△ Less
Submitted 11 August, 2024; v1 submitted 3 April, 2024;
originally announced April 2024.
-
Point Transformer with Federated Learning for Predicting Breast Cancer HER2 Status from Hematoxylin and Eosin-Stained Whole Slide Images
Authors:
Bao Li,
Zhenyu Liu,
Lizhi Shao,
Bensheng Qiu,
Hong Bu,
Jie Tian
Abstract:
Directly predicting human epidermal growth factor receptor 2 (HER2) status from widely available hematoxylin and eosin (HE)-stained whole slide images (WSIs) can reduce technical costs and expedite treatment selection. Accurately predicting HER2 requires large collections of multi-site WSIs. Federated learning enables collaborative training of these WSIs without gigabyte-size WSIs transportation a…
▽ More
Directly predicting human epidermal growth factor receptor 2 (HER2) status from widely available hematoxylin and eosin (HE)-stained whole slide images (WSIs) can reduce technical costs and expedite treatment selection. Accurately predicting HER2 requires large collections of multi-site WSIs. Federated learning enables collaborative training of these WSIs without gigabyte-size WSIs transportation and data privacy concerns. However, federated learning encounters challenges in addressing label imbalance in multi-site WSIs from the real world. Moreover, existing WSI classification methods cannot simultaneously exploit local context information and long-range dependencies in the site-end feature representation of federated learning. To address these issues, we present a point transformer with federated learning for multi-site HER2 status prediction from HE-stained WSIs. Our approach incorporates two novel designs. We propose a dynamic label distribution strategy and an auxiliary classifier, which helps to establish a well-initialized model and mitigate label distribution variations across sites. Additionally, we propose a farthest cosine sampling based on cosine distance. It can sample the most distinctive features and capture the long-range dependencies. Extensive experiments and analysis show that our method achieves state-of-the-art performance at four sites with a total of 2687 WSIs. Furthermore, we demonstrate that our model can generalize to two unseen sites with 229 WSIs.
△ Less
Submitted 27 February, 2024; v1 submitted 11 December, 2023;
originally announced December 2023.
-
Synergistic Perception and Control Simplex for Verifiable Safe Vertical Landing
Authors:
Ayoosh Bansal,
Yang Zhao,
James Zhu,
Sheng Cheng,
Yuliang Gu,
Hyung-Jin Yoon,
Hunmin Kim,
Naira Hovakimyan,
Lui Sha
Abstract:
Perception, Planning, and Control form the essential components of autonomy in advanced air mobility. This work advances the holistic integration of these components to enhance the performance and robustness of the complete cyber-physical system. We adapt Perception Simplex, a system for verifiable collision avoidance amidst obstacle detection faults, to the vertical landing maneuver for autonomou…
▽ More
Perception, Planning, and Control form the essential components of autonomy in advanced air mobility. This work advances the holistic integration of these components to enhance the performance and robustness of the complete cyber-physical system. We adapt Perception Simplex, a system for verifiable collision avoidance amidst obstacle detection faults, to the vertical landing maneuver for autonomous air mobility vehicles. We improve upon this system by replacing static assumptions of control capabilities with dynamic confirmation, i.e., real-time confirmation of control limitations of the system, ensuring reliable fulfillment of safety maneuvers and overrides, without dependence on overly pessimistic assumptions. Parameters defining control system capabilities and limitations, e.g., maximum deceleration, are continuously tracked within the system and used to make safety-critical decisions. We apply these techniques to propose a verifiable collision avoidance solution for autonomous aerial mobility vehicles operating in cluttered and potentially unsafe environments.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Jade: A Differentiable Physics Engine for Articulated Rigid Bodies with Intersection-Free Frictional Contact
Authors:
Gang Yang,
Siyuan Luo,
Lin Shao
Abstract:
We present Jade, a differentiable physics engine for articulated rigid bodies. Jade models contacts as the Linear Complementarity Problem (LCP). Compared to existing differentiable simulations, Jade offers features including intersection-free collision simulation and stable LCP solutions for multiple frictional contacts. We use continuous collision detection to detect the time of impact and adopt…
▽ More
We present Jade, a differentiable physics engine for articulated rigid bodies. Jade models contacts as the Linear Complementarity Problem (LCP). Compared to existing differentiable simulations, Jade offers features including intersection-free collision simulation and stable LCP solutions for multiple frictional contacts. We use continuous collision detection to detect the time of impact and adopt the backtracking strategy to prevent intersection between bodies with complex geometry shapes. We derive the gradient calculation to ensure the whole simulation process is differentiable under the backtracking mechanism. We modify the popular Dantzig algorithm to get valid solutions under multiple frictional contacts. We conduct extensive experiments to demonstrate the effectiveness of our differentiable physics simulation over a variety of contact-rich tasks.
△ Less
Submitted 9 September, 2023;
originally announced September 2023.
-
Backup Plan Constrained Model Predictive Control with Guaranteed Stability
Authors:
Ran Tao,
Hunmin Kim,
Hyung-Jin Yoon,
Wenbin Wan,
Naira Hovakimyan,
Lui Sha,
Petros Voulgaris
Abstract:
This article proposes and evaluates a new safety concept called backup plan safety for path planning of autonomous vehicles under mission uncertainty using model predictive control (MPC). Backup plan safety is defined as the ability to complete an alternative mission when the primary mission is aborted. To include this new safety concept in control problems, we formulate a feasibility maximization…
▽ More
This article proposes and evaluates a new safety concept called backup plan safety for path planning of autonomous vehicles under mission uncertainty using model predictive control (MPC). Backup plan safety is defined as the ability to complete an alternative mission when the primary mission is aborted. To include this new safety concept in control problems, we formulate a feasibility maximization problem aiming to maximize the feasibility of the primary and alternative missions. The feasibility maximization problem is based on multi-objective MPC, where each objective (cost function) is associated with a different mission and balanced by a weight vector. Furthermore, the feasibility maximization problem incorporates additional control input horizons toward the alternative missions on top of the control input horizon toward the primary mission, denoted as multi-horizon inputs, to evaluate the cost for each mission. We develop the backup plan constrained MPC algorithm, which designs the weight vector that ensures asymptotic stability of the closed-loop system, and generates the optimal control input by solving the feasibility maximization problem with computational efficiency. The performance of the proposed algorithm is validated through simulations of a UAV path planning problem.
△ Less
Submitted 6 October, 2023; v1 submitted 9 June, 2023;
originally announced June 2023.
-
Physical Deep Reinforcement Learning Towards Safety Guarantee
Authors:
Hongpeng Cao,
Yanbing Mao,
Lui Sha,
Marco Caccamo
Abstract:
Deep reinforcement learning (DRL) has achieved tremendous success in many complex decision-making tasks of autonomous systems with high-dimensional state and/or action spaces. However, the safety and stability still remain major concerns that hinder the applications of DRL to safety-critical autonomous systems. To address the concerns, we proposed the Phy-DRL: a physical deep reinforcement learnin…
▽ More
Deep reinforcement learning (DRL) has achieved tremendous success in many complex decision-making tasks of autonomous systems with high-dimensional state and/or action spaces. However, the safety and stability still remain major concerns that hinder the applications of DRL to safety-critical autonomous systems. To address the concerns, we proposed the Phy-DRL: a physical deep reinforcement learning framework. The Phy-DRL is novel in two architectural designs: i) Lyapunov-like reward, and ii) residual control (i.e., integration of physics-model-based control and data-driven control). The concurrent physical reward and residual control empower the Phy-DRL the (mathematically) provable safety and stability guarantees. Through experiments on the inverted pendulum, we show that the Phy-DRL features guaranteed safety and stability and enhanced robustness, while offering remarkably accelerated training and enlarged reward.
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
Compressed domain vibration detection and classification for distributed acoustic sensing
Authors:
Xingliang Shen,
Huan Wu,
Kun Zhu,
Yujia Li,
Hua Zheng,
Jialong Li,
Liyang Shao,
Perry Ping Shum,
Chao Lu
Abstract:
Distributed acoustic sensing (DAS) is a novel enabling technology that can turn existing fibre optic networks to distributed acoustic sensors. However, it faces the challenges of transmitting, storing, and processing massive streams of data which are orders of magnitude larger than that collected from point sensors. The gap between intensive data generated by DAS and modern computing system with l…
▽ More
Distributed acoustic sensing (DAS) is a novel enabling technology that can turn existing fibre optic networks to distributed acoustic sensors. However, it faces the challenges of transmitting, storing, and processing massive streams of data which are orders of magnitude larger than that collected from point sensors. The gap between intensive data generated by DAS and modern computing system with limited reading/writing speed and storage capacity imposes restrictions on many applications. Compressive sensing (CS) is a revolutionary signal acquisition method that allows a signal to be acquired and reconstructed with significantly fewer samples than that required by Nyquist-Shannon theorem. Though the data size is greatly reduced in the sampling stage, the reconstruction of the compressed data is however time and computation consuming. To address this challenge, we propose to map the feature extractor from Nyquist-domain to compressed-domain and therefore vibration detection and classification can be directly implemented in compressed-domain. The measured results show that our framework can be used to reduce the transmitted data size by 70% while achieves 99.4% true positive rate (TPR) and 0.04% false positive rate (TPR) along 5 km sensing fibre and 95.05% classification accuracy on a 5-class classification task.
△ Less
Submitted 27 December, 2022;
originally announced December 2022.
-
Perception Simplex: Verifiable Collision Avoidance in Autonomous Vehicles Amidst Obstacle Detection Faults
Authors:
Ayoosh Bansal,
Hunmin Kim,
Simon Yu,
Bo Li,
Naira Hovakimyan,
Marco Caccamo,
Lui Sha
Abstract:
Advances in deep learning have revolutionized cyber-physical applications, including the development of Autonomous Vehicles. However, real-world collisions involving autonomous control of vehicles have raised significant safety concerns regarding the use of Deep Neural Networks (DNN) in safety-critical tasks, particularly Perception. The inherent unverifiability of DNNs poses a key challenge in en…
▽ More
Advances in deep learning have revolutionized cyber-physical applications, including the development of Autonomous Vehicles. However, real-world collisions involving autonomous control of vehicles have raised significant safety concerns regarding the use of Deep Neural Networks (DNN) in safety-critical tasks, particularly Perception. The inherent unverifiability of DNNs poses a key challenge in ensuring their safe and reliable operation.
In this work, we propose Perception Simplex (PS), a fault-tolerant application architecture designed for obstacle detection and collision avoidance. We analyze an existing LiDAR-based classical obstacle detection algorithm to establish strict bounds on its capabilities and limitations. Such analysis and verification have not been possible for deep learning-based perception systems yet. By employing verifiable obstacle detection algorithms, PS identifies obstacle existence detection faults in the output of unverifiable DNN-based object detectors. When faults with potential collision risks are detected, appropriate corrective actions are initiated. Through extensive analysis and software-in-the-loop simulations, we demonstrate that PS provides predictable and deterministic fault tolerance against obstacle existence detection faults, establishing a robust safety guarantee.
△ Less
Submitted 28 November, 2023; v1 submitted 4 September, 2022;
originally announced September 2022.
-
Verifiable Obstacle Detection
Authors:
Ayoosh Bansal,
Hunmin Kim,
Simon Yu,
Bo Li,
Naira Hovakimyan,
Marco Caccamo,
Lui Sha
Abstract:
Perception of obstacles remains a critical safety concern for autonomous vehicles. Real-world collisions have shown that the autonomy faults leading to fatal collisions originate from obstacle existence detection. Open source autonomous driving implementations show a perception pipeline with complex interdependent Deep Neural Networks. These networks are not fully verifiable, making them unsuitabl…
▽ More
Perception of obstacles remains a critical safety concern for autonomous vehicles. Real-world collisions have shown that the autonomy faults leading to fatal collisions originate from obstacle existence detection. Open source autonomous driving implementations show a perception pipeline with complex interdependent Deep Neural Networks. These networks are not fully verifiable, making them unsuitable for safety-critical tasks.
In this work, we present a safety verification of an existing LiDAR based classical obstacle detection algorithm. We establish strict bounds on the capabilities of this obstacle detection algorithm. Given safety standards, such bounds allow for determining LiDAR sensor properties that would reliably satisfy the standards. Such analysis has as yet been unattainable for neural network based perception systems. We provide a rigorous analysis of the obstacle detection system with empirical results based on real-world sensor data.
△ Less
Submitted 30 August, 2022;
originally announced August 2022.
-
Learning Enriched Features for Fast Image Restoration and Enhancement
Authors:
Syed Waqas Zamir,
Aditya Arora,
Salman Khan,
Munawar Hayat,
Fahad Shahbaz Khan,
Ming-Hsuan Yang,
Ling Shao
Abstract:
Given a degraded input image, image restoration aims to recover the missing high-quality image content. Numerous applications demand effective image restoration, e.g., computational photography, surveillance, autonomous vehicles, and remote sensing. Significant advances in image restoration have been made in recent years, dominated by convolutional neural networks (CNNs). The widely-used CNN-based…
▽ More
Given a degraded input image, image restoration aims to recover the missing high-quality image content. Numerous applications demand effective image restoration, e.g., computational photography, surveillance, autonomous vehicles, and remote sensing. Significant advances in image restoration have been made in recent years, dominated by convolutional neural networks (CNNs). The widely-used CNN-based methods typically operate either on full-resolution or on progressively low-resolution representations. In the former case, spatial details are preserved but the contextual information cannot be precisely encoded. In the latter case, generated outputs are semantically reliable but spatially less accurate. This paper presents a new architecture with a holistic goal of maintaining spatially-precise high-resolution representations through the entire network, and receiving complementary contextual information from the low-resolution representations. The core of our approach is a multi-scale residual block containing the following key elements: (a) parallel multi-resolution convolution streams for extracting multi-scale features, (b) information exchange across the multi-resolution streams, (c) non-local attention mechanism for capturing contextual information, and (d) attention based multi-scale feature aggregation. Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details. Extensive experiments on six real image benchmark datasets demonstrate that our method, named as MIRNet-v2 , achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement. The source code and pre-trained models are available at https://github.com/swz30/MIRNetv2
△ Less
Submitted 19 April, 2022;
originally announced May 2022.
-
Specificity-Preserving Federated Learning for MR Image Reconstruction
Authors:
Chun-Mei Feng,
Yunlu Yan,
Shanshan Wang,
Yong Xu,
Ling Shao,
Huazhu Fu
Abstract:
Federated learning (FL) can be used to improve data privacy and efficiency in magnetic resonance (MR) image reconstruction by enabling multiple institutions to collaborate without needing to aggregate local data. However, the domain shift caused by different MR imaging protocols can substantially degrade the performance of FL models. Recent FL techniques tend to solve this by enhancing the general…
▽ More
Federated learning (FL) can be used to improve data privacy and efficiency in magnetic resonance (MR) image reconstruction by enabling multiple institutions to collaborate without needing to aggregate local data. However, the domain shift caused by different MR imaging protocols can substantially degrade the performance of FL models. Recent FL techniques tend to solve this by enhancing the generalization of the global model, but they ignore the domain-specific features, which may contain important information about the device properties and be useful for local reconstruction. In this paper, we propose a specificity-preserving FL algorithm for MR image reconstruction (FedMRI). The core idea is to divide the MR reconstruction model into two parts: a globally shared encoder to obtain a generalized representation at the global level, and a client-specific decoder to preserve the domain-specific properties of each client, which is important for collaborative reconstruction when the clients have unique distribution. Such scheme is then executed in the frequency space and the image space respectively, allowing exploration of generalized representation and client-specific properties simultaneously in different spaces. Moreover, to further boost the convergence of the globally shared encoder when a domain shift is present, a weighted contrastive regularization is introduced to directly correct any deviation between the client and server during optimization. Extensive experiments demonstrate that our FedMRI's reconstructed results are the closest to the ground-truth for multi-institutional data, and that it outperforms state-of-the-art FL methods.
△ Less
Submitted 22 August, 2022; v1 submitted 9 December, 2021;
originally announced December 2021.
-
Deep multi-modal aggregation network for MR image reconstruction with auxiliary modality
Authors:
Chun-Mei Feng,
Huazhu Fu,
Tianfei Zhou,
Yong Xu,
Ling Shao,
David Zhang
Abstract:
Magnetic resonance (MR) imaging produces detailed images of organs and tissues with better contrast, but it suffers from a long acquisition time, which makes the image quality vulnerable to say motion artifacts. Recently, many approaches have been developed to reconstruct full-sampled images from partially observed measurements to accelerate MR imaging. However, most approaches focused on reconstr…
▽ More
Magnetic resonance (MR) imaging produces detailed images of organs and tissues with better contrast, but it suffers from a long acquisition time, which makes the image quality vulnerable to say motion artifacts. Recently, many approaches have been developed to reconstruct full-sampled images from partially observed measurements to accelerate MR imaging. However, most approaches focused on reconstruction over a single modality, neglecting the discovery of correlation knowledge between the different modalities. Here we propose a Multi-modal Aggregation network for mR Image recOnstruction with auxiliary modality (MARIO), which is capable of discovering complementary representations from a fully sampled auxiliary modality, with which to hierarchically guide the reconstruction of a given target modality. This implies that our method can selectively aggregate multi-modal representations for better reconstruction, yielding comprehensive, multi-scale, multi-modal feature fusion. Extensive experiments on IXI and fastMRI datasets demonstrate the superiority of the proposed approach over state-of-the-art MR image reconstruction methods in removing artifacts.
△ Less
Submitted 21 February, 2022; v1 submitted 15 October, 2021;
originally announced October 2021.
-
Exploring Separable Attention for Multi-Contrast MR Image Super-Resolution
Authors:
Chun-Mei Feng,
Yunlu Yan,
Kai Yu,
Yong Xu,
Ling Shao,
Huazhu Fu
Abstract:
Super-resolving the Magnetic Resonance (MR) image of a target contrast under the guidance of the corresponding auxiliary contrast, which provides additional anatomical information, is a new and effective solution for fast MR imaging. However, current multi-contrast super-resolution (SR) methods tend to concatenate different contrasts directly, ignoring their relationships in different clues, e.g.,…
▽ More
Super-resolving the Magnetic Resonance (MR) image of a target contrast under the guidance of the corresponding auxiliary contrast, which provides additional anatomical information, is a new and effective solution for fast MR imaging. However, current multi-contrast super-resolution (SR) methods tend to concatenate different contrasts directly, ignoring their relationships in different clues, e.g., in the high-intensity and low-intensity regions. In this study, we propose a separable attention network (comprising high-intensity priority attention and low-intensity separation attention), named SANet. Our SANet could explore the areas of high-intensity and low-intensity regions in the "forward" and "reverse" directions with the help of the auxiliary contrast, while learning clearer anatomical structure and edge information for the SR of a target-contrast MR image. SANet provides three appealing benefits: (1) It is the first model to explore a separable attention mechanism that uses the auxiliary contrast to predict the high-intensity and low-intensity regions regions, diverting more attention to refining any uncertain details between these regions and correcting the fine areas in the reconstructed results. (2) A multi-stage integration module is proposed to learn the response of multi-contrast fusion at multiple stages, get the dependency between the fused representations, and boost their representation ability. (3) Extensive experiments with various state-of-the-art multi-contrast SR methods on fastMRI and clinical \textit{in vivo} datasets demonstrate the superiority of our model.
△ Less
Submitted 21 August, 2022; v1 submitted 3 September, 2021;
originally announced September 2021.
-
Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers
Authors:
Bo Dong,
Wenhai Wang,
Deng-Ping Fan,
Jinpeng Li,
Huazhu Fu,
Ling Shao
Abstract:
Most polyp segmentation methods use CNNs as their backbone, leading to two key issues when exchanging information between the encoder and decoder: 1) taking into account the differences in contribution between different-level features and 2) designing an effective mechanism for fusing these features. Unlike existing CNN-based methods, we adopt a transformer encoder, which learns more powerful and…
▽ More
Most polyp segmentation methods use CNNs as their backbone, leading to two key issues when exchanging information between the encoder and decoder: 1) taking into account the differences in contribution between different-level features and 2) designing an effective mechanism for fusing these features. Unlike existing CNN-based methods, we adopt a transformer encoder, which learns more powerful and robust representations. In addition, considering the image acquisition influence and elusive properties of polyps, we introduce three standard modules, including a cascaded fusion module (CFM), a camouflage identification module (CIM), and a similarity aggregation module (SAM). Among these, the CFM is used to collect the semantic and location information of polyps from high-level features; the CIM is applied to capture polyp information disguised in low-level features, and the SAM extends the pixel features of the polyp area with high-level semantic position information to the entire polyp area, thereby effectively fusing cross-level features. The proposed model, named Polyp-PVT, effectively suppresses noises in the features and significantly improves their expressive capabilities. Extensive experiments on five widely adopted datasets show that the proposed model is more robust to various challenging situations (e.g., appearance changes, small objects, rotation) than existing representative methods. The proposed model is available at https://github.com/DengPingFan/Polyp-PVT.
△ Less
Submitted 19 February, 2024; v1 submitted 16 August, 2021;
originally announced August 2021.
-
Variational Topic Inference for Chest X-Ray Report Generation
Authors:
Ivona Najdenkoska,
Xiantong Zhen,
Marcel Worring,
Ling Shao
Abstract:
Automating report generation for medical imaging promises to reduce workload and assist diagnosis in clinical practice. Recent work has shown that deep learning models can successfully caption natural images. However, learning from medical data is challenging due to the diversity and uncertainty inherent in the reports written by different radiologists with discrepant expertise and experience. To…
▽ More
Automating report generation for medical imaging promises to reduce workload and assist diagnosis in clinical practice. Recent work has shown that deep learning models can successfully caption natural images. However, learning from medical data is challenging due to the diversity and uncertainty inherent in the reports written by different radiologists with discrepant expertise and experience. To tackle these challenges, we propose variational topic inference for automatic report generation. Specifically, we introduce a set of topics as latent variables to guide sentence generation by aligning image and language modalities in a latent space. The topics are inferred in a conditional variational inference framework, with each topic governing the generation of a sentence in the report. Further, we adopt a visual attention module that enables the model to attend to different locations in the image and generate more informative descriptions. We conduct extensive experiments on two benchmarks, namely Indiana U. Chest X-rays and MIMIC-CXR. The results demonstrate that our proposed variational topic inference method can generate novel reports rather than mere copies of reports used in training, while still achieving comparable performance to state-of-the-art methods in terms of standard language generation criteria.
△ Less
Submitted 15 July, 2021;
originally announced July 2021.
-
Multi-Modal Transformer for Accelerated MR Imaging
Authors:
Chun-Mei Feng,
Yunlu Yan,
Geng Chen,
Yong Xu,
Ling Shao,
Huazhu Fu
Abstract:
Accelerated multi-modal magnetic resonance (MR) imaging is a new and effective solution for fast MR imaging, providing superior performance in restoring the target modality from its undersampled counterpart with guidance from an auxiliary modality. However, existing works simply combine the auxiliary modality as prior information, lacking in-depth investigations on the potential mechanisms for fus…
▽ More
Accelerated multi-modal magnetic resonance (MR) imaging is a new and effective solution for fast MR imaging, providing superior performance in restoring the target modality from its undersampled counterpart with guidance from an auxiliary modality. However, existing works simply combine the auxiliary modality as prior information, lacking in-depth investigations on the potential mechanisms for fusing different modalities. Further, they usually rely on the convolutional neural networks (CNNs), which is limited by the intrinsic locality in capturing the long-distance dependency. To this end, we propose a multi-modal transformer (MTrans), which is capable of transferring multi-scale features from the target modality to the auxiliary modality, for accelerated MR imaging. To capture deep multi-modal information, our MTrans utilizes an improved multi-head attention mechanism, named cross attention module, which absorbs features from the auxiliary modality that contribute to the target modality. Our framework provides three appealing benefits: (i) Our MTrans use an improved transformers for multi-modal MR imaging, affording more global information compared with existing CNN-based methods. (ii) A new cross attention module is proposed to exploit the useful information in each modality at different scales. The small patch in the target modality aims to keep more fine details, the large patch in the auxiliary modality aims to obtain high-level context features from the larger region and supplement the target modality effectively. (iii) We evaluate MTrans with various accelerated multi-modal MR imaging tasks, e.g., MR image reconstruction and super-resolution, where MTrans outperforms state-of-the-art methods on fastMRI and real-world clinical datasets.
△ Less
Submitted 11 May, 2022; v1 submitted 27 June, 2021;
originally announced June 2021.
-
DONet: Dual-Octave Network for Fast MR Image Reconstruction
Authors:
Chun-Mei Feng,
Zhanyuan Yang,
Huazhu Fu,
Yong Xu,
Jian Yang,
Ling Shao
Abstract:
Magnetic resonance (MR) image acquisition is an inherently prolonged process, whose acceleration has long been the subject of research. This is commonly achieved by obtaining multiple undersampled images, simultaneously, through parallel imaging. In this paper, we propose the Dual-Octave Network (DONet), which is capable of learning multi-scale spatial-frequency features from both the real and ima…
▽ More
Magnetic resonance (MR) image acquisition is an inherently prolonged process, whose acceleration has long been the subject of research. This is commonly achieved by obtaining multiple undersampled images, simultaneously, through parallel imaging. In this paper, we propose the Dual-Octave Network (DONet), which is capable of learning multi-scale spatial-frequency features from both the real and imaginary components of MR data, for fast parallel MR image reconstruction. More specifically, our DONet consists of a series of Dual-Octave convolutions (Dual-OctConv), which are connected in a dense manner for better reuse of features. In each Dual-OctConv, the input feature maps and convolutional kernels are first split into two components (ie, real and imaginary), and then divided into four groups according to their spatial frequencies. Then, our Dual-OctConv conducts intra-group information updating and inter-group information exchange to aggregate the contextual information across different groups. Our framework provides three appealing benefits: (i) It encourages information interaction and fusion between the real and imaginary components at various spatial frequencies to achieve richer representational capacity. (ii) The dense connections between the real and imaginary groups in each Dual-OctConv make the propagation of features more efficient by feature reuse. (iii) DONet enlarges the receptive field by learning multiple spatial-frequency features of both the real and imaginary components. Extensive experiments on two popular datasets (ie, clinical knee and fastMRI), under different undersampling patterns and acceleration factors, demonstrate the superiority of our model in accelerated parallel MR image reconstruction.
△ Less
Submitted 12 June, 2021; v1 submitted 12 May, 2021;
originally announced May 2021.
-
Dual-Octave Convolution for Accelerated Parallel MR Image Reconstruction
Authors:
Chun-Mei Feng,
Zhanyuan Yang,
Geng Chen,
Yong Xu,
Ling Shao
Abstract:
Magnetic resonance (MR) image acquisition is an inherently prolonged process, whose acceleration by obtaining multiple undersampled images simultaneously through parallel imaging has always been the subject of research. In this paper, we propose the Dual-Octave Convolution (Dual-OctConv), which is capable of learning multi-scale spatial-frequency features from both real and imaginary components, f…
▽ More
Magnetic resonance (MR) image acquisition is an inherently prolonged process, whose acceleration by obtaining multiple undersampled images simultaneously through parallel imaging has always been the subject of research. In this paper, we propose the Dual-Octave Convolution (Dual-OctConv), which is capable of learning multi-scale spatial-frequency features from both real and imaginary components, for fast parallel MR image reconstruction. By reformulating the complex operations using octave convolutions, our model shows a strong ability to capture richer representations of MR images, while at the same time greatly reducing the spatial redundancy. More specifically, the input feature maps and convolutional kernels are first split into two components (i.e., real and imaginary), which are then divided into four groups according to their spatial frequencies. Then, our Dual-OctConv conducts intra-group information updating and inter-group information exchange to aggregate the contextual information across different groups. Our framework provides two appealing benefits: (i) it encourages interactions between real and imaginary components at various spatial frequencies to achieve richer representational capacity, and (ii) it enlarges the receptive field by learning multiple spatial-frequency features of both the real and imaginary components. We evaluate the performance of the proposed model on the acceleration of multi-coil MR image reconstruction. Extensive experiments are conducted on an {in vivo} knee dataset under different undersampling patterns and acceleration factors. The experimental results demonstrate the superiority of our model in accelerated parallel MR image reconstruction. Our code is available at: github.com/chunmeifeng/Dual-OctConv.
△ Less
Submitted 12 April, 2021;
originally announced April 2021.
-
Backup Plan Constrained Model Predictive Control
Authors:
Hunmin Kim,
Hyungjin Yoon,
Wenbin Wan,
Naira Hovakimyan,
Lui Sha,
Petros Voulgaris
Abstract:
This article proposes a new safety concept: backup plan safety. The backup plan safety is defined as the ability to complete one of the alternative missions in the case of primary mission abortion. To incorporate this new safety concept in control problems, we formulate a feasibility maximization problem that adopts additional (virtual) input horizons toward the alternative missions on top of the…
▽ More
This article proposes a new safety concept: backup plan safety. The backup plan safety is defined as the ability to complete one of the alternative missions in the case of primary mission abortion. To incorporate this new safety concept in control problems, we formulate a feasibility maximization problem that adopts additional (virtual) input horizons toward the alternative missions on top of the input horizon toward the primary mission. Cost functions for the primary and alternative missions construct multiple objectives, and multi-horizon inputs evaluate them. To address the feasibility maximization problem, we develop a multi-horizon multi-objective model predictive path integral control (3M) algorithm. Model predictive path integral control (MPPI) is a sampling-based scheme that can help the proposed algorithm deal with nonlinear dynamic systems and achieve computational efficiency by parallel computation. Simulations of the aerial vehicle and ground vehicle control problems demonstrate the new concept of backup plan safety and the performance of the proposed algorithm.
△ Less
Submitted 27 March, 2021;
originally announced March 2021.
-
Brain Image Synthesis with Unsupervised Multivariate Canonical CSC$\ell_4$Net
Authors:
Yawen Huang,
Feng Zheng,
Danyang Wang,
Weilin Huang,
Matthew R. Scott,
Ling Shao
Abstract:
Recent advances in neuroscience have highlighted the effectiveness of multi-modal medical data for investigating certain pathologies and understanding human cognition. However, obtaining full sets of different modalities is limited by various factors, such as long acquisition times, high examination costs and artifact suppression. In addition, the complexity, high dimensionality and heterogeneity…
▽ More
Recent advances in neuroscience have highlighted the effectiveness of multi-modal medical data for investigating certain pathologies and understanding human cognition. However, obtaining full sets of different modalities is limited by various factors, such as long acquisition times, high examination costs and artifact suppression. In addition, the complexity, high dimensionality and heterogeneity of neuroimaging data remains another key challenge in leveraging existing randomized scans effectively, as data of the same modality is often measured differently by different machines. There is a clear need to go beyond the traditional imaging-dependent process and synthesize anatomically specific target-modality data from a source input. In this paper, we propose to learn dedicated features that cross both intre- and intra-modal variations using a novel CSC$\ell_4$Net. Through an initial unification of intra-modal data in the feature maps and multivariate canonical adaptation, CSC$\ell_4$Net facilitates feature-level mutual transformation. The positive definite Riemannian manifold-penalized data fidelity term further enables CSC$\ell_4$Net to reconstruct missing measurements according to transformed features. Finally, the maximization $\ell_4$-norm boils down to a computationally efficient optimization problem. Extensive experiments validate the ability and robustness of our CSC$\ell_4$Net compared to the state-of-the-art methods on multiple datasets.
△ Less
Submitted 22 March, 2021;
originally announced March 2021.
-
Variational Knowledge Distillation for Disease Classification in Chest X-Rays
Authors:
Tom van Sonsbeek,
Xiantong Zhen,
Marcel Worring,
Ling Shao
Abstract:
Disease classification relying solely on imaging data attracts great interest in medical image analysis. Current models could be further improved, however, by also employing Electronic Health Records (EHRs), which contain rich information on patients and findings from clinicians. It is challenging to incorporate this information into disease classification due to the high reliance on clinician inp…
▽ More
Disease classification relying solely on imaging data attracts great interest in medical image analysis. Current models could be further improved, however, by also employing Electronic Health Records (EHRs), which contain rich information on patients and findings from clinicians. It is challenging to incorporate this information into disease classification due to the high reliance on clinician input in EHRs, limiting the possibility for automated diagnosis. In this paper, we propose \textit{variational knowledge distillation} (VKD), which is a new probabilistic inference framework for disease classification based on X-rays that leverages knowledge from EHRs. Specifically, we introduce a conditional latent variable model, where we infer the latent representation of the X-ray image with the variational posterior conditioning on the associated EHR text. By doing so, the model acquires the ability to extract the visual features relevant to the disease during learning and can therefore perform more accurate classification for unseen patients at inference based solely on their X-ray scans. We demonstrate the effectiveness of our method on three public benchmark datasets with paired X-ray images and EHRs. The results show that the proposed variational knowledge distillation can consistently improve the performance of medical image classification and significantly surpasses current methods.
△ Less
Submitted 19 March, 2021;
originally announced March 2021.
-
Learning to Fuse Asymmetric Feature Maps in Siamese Trackers
Authors:
Wencheng Han,
Xingping Dong,
Fahad Shahbaz Khan,
Ling Shao,
Jianbing Shen
Abstract:
Recently, Siamese-based trackers have achieved promising performance in visual tracking. Most recent Siamese-based trackers typically employ a depth-wise cross-correlation (DW-XCorr) to obtain multi-channel correlation information from the two feature maps (target and search region). However, DW-XCorr has several limitations within Siamese-based tracking: it can easily be fooled by distractors, ha…
▽ More
Recently, Siamese-based trackers have achieved promising performance in visual tracking. Most recent Siamese-based trackers typically employ a depth-wise cross-correlation (DW-XCorr) to obtain multi-channel correlation information from the two feature maps (target and search region). However, DW-XCorr has several limitations within Siamese-based tracking: it can easily be fooled by distractors, has fewer activated channels, and provides weak discrimination of object boundaries. Further, DW-XCorr is a handcrafted parameter-free module and cannot fully benefit from offline learning on large-scale data. We propose a learnable module, called the asymmetric convolution (ACM), which learns to better capture the semantic correlation information in offline training on large-scale data. Different from DW-XCorr and its predecessor(XCorr), which regard a single feature map as the convolution kernel, our ACM decomposes the convolution operation on a concatenated feature map into two mathematically equivalent operations, thereby avoiding the need for the feature maps to be of the same size (width and height)during concatenation. Our ACM can incorporate useful prior information, such as bounding-box size, with standard visual features. Furthermore, ACM can easily be integrated into existing Siamese trackers based on DW-XCorror XCorr. To demonstrate its generalization ability, we integrate ACM into three representative trackers: SiamFC, SiamRPN++, and SiamBAN. Our experiments reveal the benefits of the proposed ACM, which outperforms existing methods on six tracking benchmarks. On the LaSOT test set, our ACM-based tracker obtains a significant improvement of 5.8% in terms of success (AUC), over the baseline.
△ Less
Submitted 30 March, 2021; v1 submitted 4 December, 2020;
originally announced December 2020.
-
Finite-Time Model Inference From A Single Noisy Trajectory
Authors:
Yanbing Mao,
Naira Hovakimyan,
Petros Voulgaris,
Lui Sha
Abstract:
This paper proposes a novel model inference procedure to identify system matrix from a single noisy trajectory over a finite-time interval. The proposed inference procedure comprises an observation data processor, a redundant data processor and an ordinary least-square estimator, wherein the data processors mitigate the influence of observation noise on inference error. We first systematically inv…
▽ More
This paper proposes a novel model inference procedure to identify system matrix from a single noisy trajectory over a finite-time interval. The proposed inference procedure comprises an observation data processor, a redundant data processor and an ordinary least-square estimator, wherein the data processors mitigate the influence of observation noise on inference error. We first systematically investigate the comparisons with naive least-square-regression based model inference and uncover that 1) the same observation data has identical influence on the feasibility of the proposed and the naive model inferences, 2) the naive model inference uses all of the redundant data, while the proposed model inference optimally uses the basis and the redundant data. We then study the sample complexity of the proposed model inference in the presence of observation noise, which leads to the dependence of the processed bias in the observed system trajectory on time and coordinates. Particularly, we derive the sample-complexity upper bound (on the number of observations sufficient to infer a model with prescribed levels of accuracy and confidence) and the sample-complexity lower bound (high-probability lower bound on model error). Finally, the proposed model inference is numerically validated and analyzed.
△ Less
Submitted 1 January, 2021; v1 submitted 13 October, 2020;
originally announced October 2020.
-
Robust Vehicle Lane Keeping Control with Networked Proactive Adaptation
Authors:
Hunmin Kim,
Wenbin Wan,
Naira Hovakimyan,
Lui Sha,
Petros Voulgaris
Abstract:
Road condition is an important environmental factor for autonomous vehicle control. A dramatic change in the road condition from the nominal status is a source of uncertainty that can lead to a system failure. Once the vehicle encounters an uncertain environment, such as hitting an ice patch, it is too late to reduce the speed, and the vehicle can lose control. To cope with future uncertainties in…
▽ More
Road condition is an important environmental factor for autonomous vehicle control. A dramatic change in the road condition from the nominal status is a source of uncertainty that can lead to a system failure. Once the vehicle encounters an uncertain environment, such as hitting an ice patch, it is too late to reduce the speed, and the vehicle can lose control. To cope with future uncertainties in advance, we study a proactive robust adaptive control architecture for autonomous vehicles' lane-keeping control problems. The data center generates a prior environmental uncertainty estimate by combining weather forecasts and measurements from anonymous vehicles through a spatio-temporal filter. The prior estimate contributes to designing a robust heading controller and nominal longitudinal velocity for proactive adaptation to each new abnormal condition. Then the control parameters are updated based on posterior information fusion with on-board measurements.
△ Less
Submitted 28 September, 2020; v1 submitted 25 September, 2020;
originally announced September 2020.
-
GRAC: Self-Guided and Self-Regularized Actor-Critic
Authors:
Lin Shao,
Yifan You,
Mengyuan Yan,
Qingyun Sun,
Jeannette Bohg
Abstract:
Deep reinforcement learning (DRL) algorithms have successfully been demonstrated on a range of challenging decision making and control tasks. One dominant component of recent deep reinforcement learning algorithms is the target network which mitigates the divergence when learning the Q function. However, target networks can slow down the learning process due to delayed function updates. Our main c…
▽ More
Deep reinforcement learning (DRL) algorithms have successfully been demonstrated on a range of challenging decision making and control tasks. One dominant component of recent deep reinforcement learning algorithms is the target network which mitigates the divergence when learning the Q function. However, target networks can slow down the learning process due to delayed function updates. Our main contribution in this work is a self-regularized TD-learning method to address divergence without requiring a target network. Additionally, we propose a self-guided policy improvement method by combining policy-gradient with zero-order optimization to search for actions associated with higher Q-values in a broad neighborhood. This makes learning more robust to local noise in the Q function approximation and guides the updates of our actor network. Taken together, these components define GRAC, a novel self-guided and self-regularized actor critic algorithm. We evaluate GRAC on the suite of OpenAI gym tasks, achieving or outperforming state of the art in every environment tested.
△ Less
Submitted 10 November, 2020; v1 submitted 18 September, 2020;
originally announced September 2020.
-
Structure Preserving Stain Normalization of Histopathology Images Using Self-Supervised Semantic Guidance
Authors:
Dwarikanath Mahapatra,
Behzad Bozorgtabar,
Jean-Philippe Thiran,
Ling Shao
Abstract:
Although generative adversarial network (GAN) based style transfer is state of the art in histopathology color-stain normalization, they do not explicitly integrate structural information of tissues. We propose a self-supervised approach to incorporate semantic guidance into a GAN based stain normalization framework and preserve detailed structural information. Our method does not require manual s…
▽ More
Although generative adversarial network (GAN) based style transfer is state of the art in histopathology color-stain normalization, they do not explicitly integrate structural information of tissues. We propose a self-supervised approach to incorporate semantic guidance into a GAN based stain normalization framework and preserve detailed structural information. Our method does not require manual segmentation maps which is a significant advantage over existing methods. We integrate semantic information at different layers between a pre-trained semantic network and the stain color normalization network. The proposed scheme outperforms other color normalization methods leading to better classification and segmentation performance.
△ Less
Submitted 3 June, 2021; v1 submitted 5 August, 2020;
originally announced August 2020.
-
SL1-Simplex: Safe Velocity Regulation of Self-Driving Vehicles in Dynamic and Unforeseen Environments
Authors:
Yanbing Mao,
Yuliang Gu,
Naira Hovakimyan,
Lui Sha,
Petros Voulgaris
Abstract:
This paper proposes a novel extension of the Simplex architecture with model switching and model learning to achieve safe velocity regulation of self-driving vehicles in dynamic and unforeseen environments. To guarantee the reliability of autonomous vehicles, an $\mathcal{L}_{1}$ adaptive controller that compensates for uncertainties and disturbances is employed by the Simplex architecture as a ve…
▽ More
This paper proposes a novel extension of the Simplex architecture with model switching and model learning to achieve safe velocity regulation of self-driving vehicles in dynamic and unforeseen environments. To guarantee the reliability of autonomous vehicles, an $\mathcal{L}_{1}$ adaptive controller that compensates for uncertainties and disturbances is employed by the Simplex architecture as a verified safe controller to tolerate concurrent software and physical failures. Meanwhile, safe switching controller is incorporated into the Simplex for safe velocity regulation through the integration of the traction control system and anti-lock braking system. Specifically, the vehicle's angular and longitudinal velocities asymptotically track the provided references that vary with driving environments, while the wheel slips are restricted to safety envelopes to prevent slipping and sliding. Due to the high dependence of vehicle dynamics on the driving environments, the proposed Simplex leverages the finite-time model learning to timely learn and update the vehicle model for $\mathcal{L}_{1}$ adaptive controller, when any deviation from the safety envelope or the uncertainty measurement threshold occurs in the unforeseen driving environments. Finally, the effectiveness of the proposed Simplex architecture for safe velocity regulation is validated by the AutoRally platform.
△ Less
Submitted 1 February, 2022; v1 submitted 4 August, 2020;
originally announced August 2020.
-
Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition
Authors:
Ionut Cosmin Duta,
Li Liu,
Fan Zhu,
Ling Shao
Abstract:
This work introduces pyramidal convolution (PyConv), which is capable of processing the input at multiple filter scales. PyConv contains a pyramid of kernels, where each level involves different types of filters with varying size and depth, which are able to capture different levels of details in the scene. On top of these improved recognition capabilities, PyConv is also efficient and, with our f…
▽ More
This work introduces pyramidal convolution (PyConv), which is capable of processing the input at multiple filter scales. PyConv contains a pyramid of kernels, where each level involves different types of filters with varying size and depth, which are able to capture different levels of details in the scene. On top of these improved recognition capabilities, PyConv is also efficient and, with our formulation, it does not increase the computational cost and parameters compared to standard convolution. Moreover, it is very flexible and extensible, providing a large space of potential network architectures for different applications. PyConv has the potential to impact nearly every computer vision task and, in this work, we present different architectures based on PyConv for four main tasks on visual recognition: image classification, video action classification/recognition, object detection and semantic image segmentation/parsing. Our approach shows significant improvements over all these core tasks in comparison with the baselines. For instance, on image recognition, our 50-layers network outperforms in terms of recognition performance on ImageNet dataset its counterpart baseline ResNet with 152 layers, while having 2.39 times less parameters, 2.52 times lower computational complexity and more than 3 times less layers. On image segmentation, our novel framework sets a new state-of-the-art on the challenging ADE20K benchmark for scene parsing. Code is available at: https://github.com/iduta/pyconv
△ Less
Submitted 20 June, 2020;
originally announced June 2020.
-
PraNet: Parallel Reverse Attention Network for Polyp Segmentation
Authors:
Deng-Ping Fan,
Ge-Peng Ji,
Tao Zhou,
Geng Chen,
Huazhu Fu,
Jianbing Shen,
Ling Shao
Abstract:
Colonoscopy is an effective technique for detecting colorectal polyps, which are highly related to colorectal cancer. In clinical practice, segmenting polyps from colonoscopy images is of great importance since it provides valuable information for diagnosis and surgery. However, accurate polyp segmentation is a challenging task, for two major reasons: (i) the same type of polyps has a diversity of…
▽ More
Colonoscopy is an effective technique for detecting colorectal polyps, which are highly related to colorectal cancer. In clinical practice, segmenting polyps from colonoscopy images is of great importance since it provides valuable information for diagnosis and surgery. However, accurate polyp segmentation is a challenging task, for two major reasons: (i) the same type of polyps has a diversity of size, color and texture; and (ii) the boundary between a polyp and its surrounding mucosa is not sharp. To address these challenges, we propose a parallel reverse attention network (PraNet) for accurate polyp segmentation in colonoscopy images. Specifically, we first aggregate the features in high-level layers using a parallel partial decoder (PPD). Based on the combined feature, we then generate a global map as the initial guidance area for the following components. In addition, we mine the boundary cues using a reverse attention (RA) module, which is able to establish the relationship between areas and boundary cues. Thanks to the recurrent cooperation mechanism between areas and boundaries, our PraNet is capable of calibrating any misaligned predictions, improving the segmentation accuracy. Quantitative and qualitative evaluations on five challenging datasets across six metrics show that our PraNet improves the segmentation accuracy significantly, and presents a number of advantages in terms of generalizability, and real-time segmentation efficiency.
△ Less
Submitted 3 July, 2020; v1 submitted 13 June, 2020;
originally announced June 2020.
-
M2Net: Multi-modal Multi-channel Network for Overall Survival Time Prediction of Brain Tumor Patients
Authors:
Tao Zhou,
Huazhu Fu,
Yu Zhang,
Changqing Zhang,
Xiankai Lu,
Jianbing Shen,
Ling Shao
Abstract:
Early and accurate prediction of overall survival (OS) time can help to obtain better treatment planning for brain tumor patients. Although many OS time prediction methods have been developed and obtain promising results, there are still several issues. First, conventional prediction methods rely on radiomic features at the local lesion area of a magnetic resonance (MR) volume, which may not repre…
▽ More
Early and accurate prediction of overall survival (OS) time can help to obtain better treatment planning for brain tumor patients. Although many OS time prediction methods have been developed and obtain promising results, there are still several issues. First, conventional prediction methods rely on radiomic features at the local lesion area of a magnetic resonance (MR) volume, which may not represent the full image or model complex tumor patterns. Second, different types of scanners (i.e., multi-modal data) are sensitive to different brain regions, which makes it challenging to effectively exploit the complementary information across multiple modalities and also preserve the modality-specific properties. Third, existing methods focus on prediction models, ignoring complex data-to-label relationships. To address the above issues, we propose an end-to-end OS time prediction model; namely, Multi-modal Multi-channel Network (M2Net). Specifically, we first project the 3D MR volume onto 2D images in different directions, which reduces computational costs, while preserving important information and enabling pre-trained models to be transferred from other tasks. Then, we use a modality-specific network to extract implicit and high-level features from different MR scans. A multi-modal shared network is built to fuse these features using a bilinear pooling model, exploiting their correlations to provide complementary information. Finally, we integrate the outputs from each modality-specific network and the multi-modal shared network to generate the final prediction result. Experimental results demonstrate the superiority of our M2Net model over other methods.
△ Less
Submitted 14 July, 2020; v1 submitted 1 June, 2020;
originally announced June 2020.
-
Safety Constrained Multi-UAV Time Coordination: A Bi-level Control Framework in GPS Denied Environment
Authors:
Wenbin Wan,
Hunmin Kim,
Yikun Cheng,
Naira Hovakimyan,
Petros G. Voulgaris,
Lui Sha
Abstract:
Unmanned aerial vehicles (UAVs) suffer from sensor drifts in GPS denied environments, which can cause safety issues. To avoid intolerable sensor drifts while completing the time-critical coordination task for multi-UAV systems, we propose a safety constrained bi-level control framework. The first level is the time-critical coordination level that achieves a consensus of coordination states and pro…
▽ More
Unmanned aerial vehicles (UAVs) suffer from sensor drifts in GPS denied environments, which can cause safety issues. To avoid intolerable sensor drifts while completing the time-critical coordination task for multi-UAV systems, we propose a safety constrained bi-level control framework. The first level is the time-critical coordination level that achieves a consensus of coordination states and provides a virtual target which is a function of the coordination state. The second level is the safety-critical control level that is designed to follow the virtual target while adapting the attacked UAV(s) at a path re-planning level to support resilient state estimation. In particular, the time-critical coordination level framework generates the desired speed and position profile of the virtual target based on the multi-UAV cooperative mission by the proposed consensus protocol algorithm. The safety-critical control level is able to make each UAV follow its assigned path while detecting the attacks, estimating the state resiliently, and driving the UAV(s) outside the effective range of the spoofing device within the escape time. The numerical simulations of a three-UAV system demonstrate the effectiveness of the proposed safety constrained bi-level control framework.
△ Less
Submitted 19 May, 2020; v1 submitted 14 May, 2020;
originally announced May 2020.
-
Modeling and Enhancing Low-quality Retinal Fundus Images
Authors:
Ziyi Shen,
Huazhu Fu,
Jianbing Shen,
Ling Shao
Abstract:
Retinal fundus images are widely used for the clinical screening and diagnosis of eye diseases. However, fundus images captured by operators with various levels of experience have a large variation in quality. Low-quality fundus images increase uncertainty in clinical observation and lead to the risk of misdiagnosis. However, due to the special optical beam of fundus imaging and structure of the r…
▽ More
Retinal fundus images are widely used for the clinical screening and diagnosis of eye diseases. However, fundus images captured by operators with various levels of experience have a large variation in quality. Low-quality fundus images increase uncertainty in clinical observation and lead to the risk of misdiagnosis. However, due to the special optical beam of fundus imaging and structure of the retina, natural image enhancement methods cannot be utilized directly to address this. In this paper, we first analyze the ophthalmoscope imaging system and simulate a reliable degradation of major inferior-quality factors, including uneven illumination, image blurring, and artifacts. Then, based on the degradation model, a clinically oriented fundus enhancement network (cofe-Net) is proposed to suppress global degradation factors, while simultaneously preserving anatomical retinal structures and pathological characteristics for clinical observation and analysis. Experiments on both synthetic and real images demonstrate that our algorithm effectively corrects low-quality fundus images without losing retinal details. Moreover, we also show that the fundus correction method can benefit medical image analysis applications, e.g., retinal vessel segmentation and optic disc/cup detection.
△ Less
Submitted 9 December, 2020; v1 submitted 12 May, 2020;
originally announced May 2020.
-
Inf-Net: Automatic COVID-19 Lung Infection Segmentation from CT Images
Authors:
Deng-Ping Fan,
Tao Zhou,
Ge-Peng Ji,
Yi Zhou,
Geng Chen,
Huazhu Fu,
Jianbing Shen,
Ling Shao
Abstract:
Coronavirus Disease 2019 (COVID-19) spread globally in early 2020, causing the world to face an existential health crisis. Automated detection of lung infections from computed tomography (CT) images offers a great potential to augment the traditional healthcare strategy for tackling COVID-19. However, segmenting infected regions from CT slices faces several challenges, including high variation in…
▽ More
Coronavirus Disease 2019 (COVID-19) spread globally in early 2020, causing the world to face an existential health crisis. Automated detection of lung infections from computed tomography (CT) images offers a great potential to augment the traditional healthcare strategy for tackling COVID-19. However, segmenting infected regions from CT slices faces several challenges, including high variation in infection characteristics, and low intensity contrast between infections and normal tissues. Further, collecting a large amount of data is impractical within a short time period, inhibiting the training of a deep model. To address these challenges, a novel COVID-19 Lung Infection Segmentation Deep Network (Inf-Net) is proposed to automatically identify infected regions from chest CT slices. In our Inf-Net, a parallel partial decoder is used to aggregate the high-level features and generate a global map. Then, the implicit reverse attention and explicit edge-attention are utilized to model the boundaries and enhance the representations. Moreover, to alleviate the shortage of labeled data, we present a semi-supervised segmentation framework based on a randomly selected propagation strategy, which only requires a few labeled images and leverages primarily unlabeled data. Our semi-supervised framework can improve the learning ability and achieve a higher performance. Extensive experiments on our COVID-SemiSeg and real CT volumes demonstrate that the proposed Inf-Net outperforms most cutting-edge segmentation models and advances the state-of-the-art performance.
△ Less
Submitted 21 May, 2020; v1 submitted 22 April, 2020;
originally announced April 2020.
-
Design and Control of Roller Grasper V2 for In-Hand Manipulation
Authors:
Shenli Yuan,
Lin Shao,
Connor L. Yako,
Alex Gruebele,
J. Kenneth Salisbury
Abstract:
The ability to perform in-hand manipulation still remains an unsolved problem; having this capability would allow robots to perform sophisticated tasks requiring repositioning and reorienting of grasped objects. In this work, we present a novel non-anthropomorphic robot grasper with the ability to manipulate objects by means of active surfaces at the fingertips. Active surfaces are achieved by sph…
▽ More
The ability to perform in-hand manipulation still remains an unsolved problem; having this capability would allow robots to perform sophisticated tasks requiring repositioning and reorienting of grasped objects. In this work, we present a novel non-anthropomorphic robot grasper with the ability to manipulate objects by means of active surfaces at the fingertips. Active surfaces are achieved by spherical rolling fingertips with two degrees of freedom (DoF) -- a pivoting motion for surface reorientation -- and a continuous rolling motion for moving the object. A further DoF is in the base of each finger, allowing the fingers to grasp objects over a range of size and shapes. Instantaneous kinematics was derived and objects were successfully manipulated both with a custom handcrafted control scheme as well as one learned through imitation learning, in simulation and experimentally on the hardware.
△ Less
Submitted 17 November, 2020; v1 submitted 17 April, 2020;
originally announced April 2020.
-
Multi-Granularity Canonical Appearance Pooling for Remote Sensing Scene Classification
Authors:
S. Wang,
Y. Guan,
L. Shao
Abstract:
Recognising remote sensing scene images remains challenging due to large visual-semantic discrepancies. These mainly arise due to the lack of detailed annotations that can be employed to align pixel-level representations with high-level semantic labels. As the tagging process is labour-intensive and subjective, we hereby propose a novel Multi-Granularity Canonical Appearance Pooling (MG-CAP) to au…
▽ More
Recognising remote sensing scene images remains challenging due to large visual-semantic discrepancies. These mainly arise due to the lack of detailed annotations that can be employed to align pixel-level representations with high-level semantic labels. As the tagging process is labour-intensive and subjective, we hereby propose a novel Multi-Granularity Canonical Appearance Pooling (MG-CAP) to automatically capture the latent ontological structure of remote sensing datasets. We design a granular framework that allows progressively cropping the input image to learn multi-grained features. For each specific granularity, we discover the canonical appearance from a set of pre-defined transformations and learn the corresponding CNN features through a maxout-based Siamese style architecture. Then, we replace the standard CNN features with Gaussian covariance matrices and adopt the proper matrix normalisations for improving the discriminative power of features. Besides, we provide a stable solution for training the eigenvalue-decomposition function (EIG) in a GPU and demonstrate the corresponding back-propagation using matrix calculus. Extensive experiments have shown that our framework can achieve promising results in public remote sensing scene datasets.
△ Less
Submitted 9 April, 2020;
originally announced April 2020.
-
Pathological Retinal Region Segmentation From OCT Images Using Geometric Relation Based Augmentation
Authors:
Dwarikanath Mahapatra,
Behzad Bozorgtabar,
Jean-Philippe Thiran,
Ling Shao
Abstract:
Medical image segmentation is an important task for computer aided diagnosis. Pixelwise manual annotations of large datasets require high expertise and is time consuming. Conventional data augmentations have limited benefit by not fully representing the underlying distribution of the training set, thus affecting model robustness when tested on images captured from different sources. Prior work lev…
▽ More
Medical image segmentation is an important task for computer aided diagnosis. Pixelwise manual annotations of large datasets require high expertise and is time consuming. Conventional data augmentations have limited benefit by not fully representing the underlying distribution of the training set, thus affecting model robustness when tested on images captured from different sources. Prior work leverages synthetic images for data augmentation ignoring the interleaved geometric relationship between different anatomical labels. We propose improvements over previous GAN-based medical image synthesis methods by jointly encoding the intrinsic relationship of geometry and shape. Latent space variable sampling results in diverse generated images from a base image and improves robustness. Given those augmented images generated by our method, we train the segmentation network to enhance the segmentation performance of retinal optical coherence tomography (OCT) images. The proposed method outperforms state-of-the-art segmentation methods on the public RETOUCH dataset having images captured from different acquisition procedures. Ablation studies and visual analysis also demonstrate benefits of integrating geometry and diversity.
△ Less
Submitted 25 April, 2020; v1 submitted 31 March, 2020;
originally announced March 2020.
-
CycleISP: Real Image Restoration via Improved Data Synthesis
Authors:
Syed Waqas Zamir,
Aditya Arora,
Salman Khan,
Munawar Hayat,
Fahad Shahbaz Khan,
Ming-Hsuan Yang,
Ling Shao
Abstract:
The availability of large-scale datasets has helped unleash the true potential of deep convolutional neural networks (CNNs). However, for the single-image denoising problem, capturing a real dataset is an unacceptably expensive and cumbersome procedure. Consequently, image denoising algorithms are mostly developed and evaluated on synthetic data that is usually generated with a widespread assumpti…
▽ More
The availability of large-scale datasets has helped unleash the true potential of deep convolutional neural networks (CNNs). However, for the single-image denoising problem, capturing a real dataset is an unacceptably expensive and cumbersome procedure. Consequently, image denoising algorithms are mostly developed and evaluated on synthetic data that is usually generated with a widespread assumption of additive white Gaussian noise (AWGN). While the CNNs achieve impressive results on these synthetic datasets, they do not perform well when applied on real camera images, as reported in recent benchmark datasets. This is mainly because the AWGN is not adequate for modeling the real camera noise which is signal-dependent and heavily transformed by the camera imaging pipeline. In this paper, we present a framework that models camera imaging pipeline in forward and reverse directions. It allows us to produce any number of realistic image pairs for denoising both in RAW and sRGB spaces. By training a new image denoising network on realistic synthetic data, we achieve the state-of-the-art performance on real camera benchmark datasets. The parameters in our model are ~5 times lesser than the previous best method for RAW denoising. Furthermore, we demonstrate that the proposed framework generalizes beyond image denoising problem e.g., for color matching in stereoscopic cinema. The source code and pre-trained models are available at https://github.com/swz30/CycleISP.
△ Less
Submitted 17 March, 2020;
originally announced March 2020.
-
Motion-Attentive Transition for Zero-Shot Video Object Segmentation
Authors:
Tianfei Zhou,
Shunzhou Wang,
Yi Zhou,
Yazhou Yao,
Jianwu Li,
Ling Shao
Abstract:
In this paper, we present a novel Motion-Attentive Transition Network (MATNet) for zero-shot video object segmentation, which provides a new way of leveraging motion information to reinforce spatio-temporal object representation. An asymmetric attention block, called Motion-Attentive Transition (MAT), is designed within a two-stream encoder, which transforms appearance features into motion-attenti…
▽ More
In this paper, we present a novel Motion-Attentive Transition Network (MATNet) for zero-shot video object segmentation, which provides a new way of leveraging motion information to reinforce spatio-temporal object representation. An asymmetric attention block, called Motion-Attentive Transition (MAT), is designed within a two-stream encoder, which transforms appearance features into motion-attentive representations at each convolutional stage. In this way, the encoder becomes deeply interleaved, allowing for closely hierarchical interactions between object motion and appearance. This is superior to the typical two-stream architecture, which treats motion and appearance separately in each stream and often suffers from overfitting to appearance information. Additionally, a bridge network is proposed to obtain a compact, discriminative and scale-sensitive representation for multi-level encoder features, which is further fed into a decoder to achieve segmentation results. Extensive experiments on three challenging public benchmarks (i.e. DAVIS-16, FBMS and Youtube-Objects) show that our model achieves compelling performance against the state-of-the-arts.
△ Less
Submitted 9 July, 2020; v1 submitted 9 March, 2020;
originally announced March 2020.
-
Hi-Net: Hybrid-fusion Network for Multi-modal MR Image Synthesis
Authors:
Tao Zhou,
Huazhu Fu,
Geng Chen,
Jianbing Shen,
Ling Shao
Abstract:
Magnetic resonance imaging (MRI) is a widely used neuroimaging technique that can provide images of different contrasts (i.e., modalities). Fusing this multi-modal data has proven particularly effective for boosting model performance in many tasks. However, due to poor data quality and frequent patient dropout, collecting all modalities for every patient remains a challenge. Medical image synthesi…
▽ More
Magnetic resonance imaging (MRI) is a widely used neuroimaging technique that can provide images of different contrasts (i.e., modalities). Fusing this multi-modal data has proven particularly effective for boosting model performance in many tasks. However, due to poor data quality and frequent patient dropout, collecting all modalities for every patient remains a challenge. Medical image synthesis has been proposed as an effective solution to this, where any missing modalities are synthesized from the existing ones. In this paper, we propose a novel Hybrid-fusion Network (Hi-Net) for multi-modal MR image synthesis, which learns a mapping from multi-modal source images (i.e., existing modalities) to target images (i.e., missing modalities). In our Hi-Net, a modality-specific network is utilized to learn representations for each individual modality, and a fusion network is employed to learn the common latent representation of multi-modal data. Then, a multi-modal synthesis network is designed to densely combine the latent representation with hierarchical features from each modality, acting as a generator to synthesize the target images. Moreover, a layer-wise multi-modal fusion strategy is presented to effectively exploit the correlations among multiple modalities, in which a Mixed Fusion Block (MFB) is proposed to adaptively weight different fusion strategies (i.e., element-wise summation, product, and maximization). Extensive experiments demonstrate that the proposed model outperforms other state-of-the-art medical image synthesis methods.
△ Less
Submitted 11 February, 2020;
originally announced February 2020.
-
DR-GAN: Conditional Generative Adversarial Network for Fine-Grained Lesion Synthesis on Diabetic Retinopathy Images
Authors:
Yi Zhou,
Boyang Wang,
Xiaodong He,
Shanshan Cui,
Ling Shao
Abstract:
Diabetic retinopathy (DR) is a complication of diabetes that severely affects eyes. It can be graded into five levels of severity according to international protocol. However, optimizing a grading model to have strong generalizability requires a large amount of balanced training data, which is difficult to collect particularly for the high severity levels. Typical data augmentation methods, includ…
▽ More
Diabetic retinopathy (DR) is a complication of diabetes that severely affects eyes. It can be graded into five levels of severity according to international protocol. However, optimizing a grading model to have strong generalizability requires a large amount of balanced training data, which is difficult to collect particularly for the high severity levels. Typical data augmentation methods, including random flipping and rotation, cannot generate data with high diversity. In this paper, we propose a diabetic retinopathy generative adversarial network (DR-GAN) to synthesize high-resolution fundus images which can be manipulated with arbitrary grading and lesion information. Thus, large-scale generated data can be used for more meaningful augmentation to train a DR grading and lesion segmentation model. The proposed retina generator is conditioned on the structural and lesion masks, as well as adaptive grading vectors sampled from the latent grading space, which can be adopted to control the synthesized grading severity. Moreover, a multi-scale spatial and channel attention module is devised to improve the generation ability to synthesize details. Multi-scale discriminators are designed to operate from large to small receptive fields, and joint adversarial losses are adopted to optimize the whole network in an end-to-end manner. With extensive experiments evaluated on the EyePACS dataset connected to Kaggle, as well as the FGADR dataset, we validate the effectiveness of our method, which can both synthesize highly realistic (1280 x 1280) controllable fundus images and contribute to the DR grading task.
△ Less
Submitted 11 November, 2020; v1 submitted 10 December, 2019;
originally announced December 2019.
-
Semi-Heterogeneous Three-Way Joint Embedding Network for Sketch-Based Image Retrieval
Authors:
Jianjun Lei,
Yuxin Song,
Bo Peng,
Zhanyu Ma,
Ling Shao,
Yi-Zhe Song
Abstract:
Sketch-based image retrieval (SBIR) is a challenging task due to the large cross-domain gap between sketches and natural images. How to align abstract sketches and natural images into a common high-level semantic space remains a key problem in SBIR. In this paper, we propose a novel semi-heterogeneous three-way joint embedding network (Semi3-Net), which integrates three branches (a sketch branch,…
▽ More
Sketch-based image retrieval (SBIR) is a challenging task due to the large cross-domain gap between sketches and natural images. How to align abstract sketches and natural images into a common high-level semantic space remains a key problem in SBIR. In this paper, we propose a novel semi-heterogeneous three-way joint embedding network (Semi3-Net), which integrates three branches (a sketch branch, a natural image branch, and an edgemap branch) to learn more discriminative cross-domain feature representations for the SBIR task. The key insight lies with how we cultivate the mutual and subtle relationships amongst the sketches, natural images, and edgemaps. A semi-heterogeneous feature mapping is designed to extract bottom features from each domain, where the sketch and edgemap branches are shared while the natural image branch is heterogeneous to the other branches. In addition, a joint semantic embedding is introduced to embed the features from different domains into a common high-level semantic space, where all of the three branches are shared. To further capture informative features common to both natural images and the corresponding edgemaps, a co-attention model is introduced to conduct common channel-wise feature recalibration between different domains. A hybrid-loss mechanism is designed to align the three branches, where an alignment loss and a sketch-edgemap contrastive loss are presented to encourage the network to learn invariant cross-domain representations. Experimental results on two widely used category-level datasets (Sketchy and TU-Berlin Extension) demonstrate that the proposed method outperforms state-of-the-art methods.
△ Less
Submitted 9 November, 2019;
originally announced November 2019.
-
Learning to Scaffold the Development of Robotic Manipulation Skills
Authors:
Lin Shao,
Toki Migimatsu,
Jeannette Bohg
Abstract:
Learning contact-rich, robotic manipulation skills is a challenging problem due to the high-dimensionality of the state and action space as well as uncertainty from noisy sensors and inaccurate motor control. To combat these factors and achieve more robust manipulation, humans actively exploit contact constraints in the environment. By adopting a similar strategy, robots can also achieve more robu…
▽ More
Learning contact-rich, robotic manipulation skills is a challenging problem due to the high-dimensionality of the state and action space as well as uncertainty from noisy sensors and inaccurate motor control. To combat these factors and achieve more robust manipulation, humans actively exploit contact constraints in the environment. By adopting a similar strategy, robots can also achieve more robust manipulation. In this paper, we enable a robot to autonomously modify its environment and thereby discover how to ease manipulation skill learning. Specifically, we provide the robot with fixtures that it can freely place within the environment. These fixtures provide hard constraints that limit the outcome of robot actions. Thereby, they funnel uncertainty from perception and motor control and scaffold manipulation skill learning. We propose a learning system that consists of two learning loops. In the outer loop, the robot positions the fixture in the workspace. In the inner loop, the robot learns a manipulation skill and after a fixed number of episodes, returns the reward to the outer loop. Thereby, the robot is incentivised to place the fixture such that the inner loop quickly achieves a high reward. We demonstrate our framework both in simulation and in the real world on three tasks: peg insertion, wrench manipulation and shallow-depth insertion. We show that manipulation skill learning is dramatically sped up through this way of scaffolding.
△ Less
Submitted 5 October, 2020; v1 submitted 3 November, 2019;
originally announced November 2019.
-
A Safety Constrained Control Framework for UAVs in GPS Denied Environment
Authors:
Wenbin Wan,
Hunmin Kim,
Naira Hovakimyan,
Lui Sha,
Petros G. Voulgaris
Abstract:
Unmanned aerial vehicles (UAVs) suffer from sensor drifts in GPS denied environments, which can lead to potentially dangerous situations. To avoid intolerable sensor drifts in the presence of GPS spoofing attacks, we propose a safety constrained control framework that adapts the UAV at a path re-planning level to support resilient state estimation against GPS spoofing attacks. The attack detector…
▽ More
Unmanned aerial vehicles (UAVs) suffer from sensor drifts in GPS denied environments, which can lead to potentially dangerous situations. To avoid intolerable sensor drifts in the presence of GPS spoofing attacks, we propose a safety constrained control framework that adapts the UAV at a path re-planning level to support resilient state estimation against GPS spoofing attacks. The attack detector is used to detect GPS spoofing attacks based on the resilient state estimation and provides a switching criterion between the robust control mode and emergency control mode. To quantify the safety margin, we introduce the escape time which is defined as a safe time under which the state estimation error remains within a tolerable error with designated confidence. An attacker location tracker (ALT) is developed to track the location of the attacker and estimate the output power of the spoofing device by the unscented Kalman filter (UKF) with sliding window outputs. Using the estimates from ALT, an escape controller (ESC) is designed based on the constrained model predictive controller (MPC) such that the UAV escapes from the effective range of the spoofing device within the escape time.
△ Less
Submitted 12 April, 2020; v1 submitted 23 October, 2019;
originally announced October 2019.
-
Learning Visual Dynamics Models of Rigid Objects using Relational Inductive Biases
Authors:
Fabio Ferreira,
Lin Shao,
Tamim Asfour,
Jeannette Bohg
Abstract:
Endowing robots with human-like physical reasoning abilities remains challenging. We argue that existing methods often disregard spatio-temporal relations and by using Graph Neural Networks (GNNs) that incorporate a relational inductive bias, we can shift the learning process towards exploiting relations. In this work, we learn action-conditional forward dynamics models of a simulated manipulation…
▽ More
Endowing robots with human-like physical reasoning abilities remains challenging. We argue that existing methods often disregard spatio-temporal relations and by using Graph Neural Networks (GNNs) that incorporate a relational inductive bias, we can shift the learning process towards exploiting relations. In this work, we learn action-conditional forward dynamics models of a simulated manipulation task from visual observations involving cluttered and irregularly shaped objects. We investigate two GNN approaches and empirically assess their capability to generalize to scenarios with novel and an increasing number of objects. The first, Graph Networks (GN) based approach, considers explicitly defined edge attributes and not only does it consistently underperform an auto-encoder baseline that we modified to predict future states, our results indicate how different edge attributes can significantly influence the predictions. Consequently, we develop the Auto-Predictor that does not rely on explicitly defined edge attributes. It outperforms the baseline and the GN-based models. Overall, our results show the sensitivity of GNN-based approaches to the task representation, the efficacy of relational inductive biases and advocate choosing lightweight approaches that implicitly reason about relations over ones that leave these decisions to human designers.
△ Less
Submitted 23 October, 2019; v1 submitted 9 September, 2019;
originally announced September 2019.