-
Subspecialty-Specific Foundation Model for Intelligent Gastrointestinal Pathology
Authors:
Lianghui Zhu,
Xitong Ling,
Minxi Ouyang,
Xiaoping Liu,
Tian Guan,
Mingxi Fu,
Zhiqiang Cheng,
Fanglei Fu,
Maomao Zeng,
Liming Liu,
Song Duan,
Qiang Huang,
Ying Xiao,
Jianming Li,
Shanming Lu,
Zhenghua Piao,
Mingxi Zhu,
Yibo Jin,
Shan Xu,
Qiming He,
Yizhi Wang,
Junru Cheng,
Xuanyu Wang,
Luxi Xie,
Houqiang Li
, et al. (2 additional authors not shown)
Abstract:
Gastrointestinal (GI) diseases represent a clinically significant burden, necessitating precise diagnostic approaches to optimize patient outcomes. Conventional histopathological diagnosis suffers from limited reproducibility and diagnostic variability. To overcome these limitations, we develop Digepath, a specialized foundation model for GI pathology. Our framework introduces a dual-phase iterati…
▽ More
Gastrointestinal (GI) diseases represent a clinically significant burden, necessitating precise diagnostic approaches to optimize patient outcomes. Conventional histopathological diagnosis suffers from limited reproducibility and diagnostic variability. To overcome these limitations, we develop Digepath, a specialized foundation model for GI pathology. Our framework introduces a dual-phase iterative optimization strategy combining pretraining with fine-screening, specifically designed to address the detection of sparsely distributed lesion areas in whole-slide images. Digepath is pretrained on over 353 million multi-scale images from 210,043 H&E-stained slides of GI diseases. It attains state-of-the-art performance on 33 out of 34 tasks related to GI pathology, including pathological diagnosis, protein expression status prediction, gene mutation prediction, and prognosis evaluation. We further translate the intelligent screening module for early GI cancer and achieve near-perfect 99.70% sensitivity across nine independent medical institutions. This work not only advances AI-driven precision pathology for GI diseases but also bridge critical gaps in histopathological practice.
△ Less
Submitted 6 June, 2025; v1 submitted 27 May, 2025;
originally announced May 2025.
-
Exploiting Age of Information in Network Digital Twins for AI-driven Real-Time Link Blockage Detection
Authors:
Michele Zhu,
Francesco Linsalata,
Silvia Mura,
Lorenzo Cazzella,
Damiano Badini,
Umberto Spagnolini
Abstract:
The Line-of-Sight (LoS) identification is crucial to ensure reliable high-frequency communication links, especially those vulnerable to blockages. Network Digital Twins and Artificial Intelligence are key technologies enabling blockage detection (LoS identification) for high-frequency wireless systems, e.g., 6>GHz. In this work, we enhance Network Digital Twins by incorporating Age of Information…
▽ More
The Line-of-Sight (LoS) identification is crucial to ensure reliable high-frequency communication links, especially those vulnerable to blockages. Network Digital Twins and Artificial Intelligence are key technologies enabling blockage detection (LoS identification) for high-frequency wireless systems, e.g., 6>GHz. In this work, we enhance Network Digital Twins by incorporating Age of Information (AoI) metrics, a quantification of status update freshness, enabling reliable real-time blockage detection (LoS identification) in dynamic wireless environments. By integrating raytracing techniques, we automate large-scale collection and labeling of channel data, specifically tailored to the evolving conditions of the environment. The introduced AoI is integrated with the loss function to prioritize more recent information to fine-tune deep learning models in case of performance degradation (model drift). The effectiveness of the proposed solution is demonstrated in realistic urban simulations, highlighting the trade-off between input resolution, computational cost, and model performance. A resolution reduction of 4x8 from an original channel sample size of (32, 1024) along the angle and subcarrier dimension results in a computational speedup of 32 times. The proposed fine-tuning successfully mitigates performance degradation while requiring only 1% of the available data samples, enabling automated and fast mitigation of model drifts.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
AI-empowered Real-Time Line-of-Sight Identification via Network Digital Twins
Authors:
Michele Zhu,
Silvia Mura,
Francesco Linsalata,
Lorenzo Cazzella,
Damiano Badini,
Umberto Spagnolini
Abstract:
The identification of Line-of-Sight (LoS) conditions is critical for ensuring reliable high-frequency communication links, which are particularly vulnerable to blockages and rapid channel variations. Network Digital Twins (NDTs) and Ray-Tracing (RT) techniques can significantly automate the large-scale collection and labeling of channel data, tailored to specific wireless environments. This paper…
▽ More
The identification of Line-of-Sight (LoS) conditions is critical for ensuring reliable high-frequency communication links, which are particularly vulnerable to blockages and rapid channel variations. Network Digital Twins (NDTs) and Ray-Tracing (RT) techniques can significantly automate the large-scale collection and labeling of channel data, tailored to specific wireless environments. This paper examines the quality of Artificial Intelligence (AI) models trained on data generated by Network Digital Twins. We propose and evaluate training strategies for a general-purpose Deep Learning model, demonstrating superior performance compared to the current state-of-the-art. In terms of classification accuracy, our approach outperforms the state-of-the-art Deep Learning model by 5% in very low SNR conditions and by approximately 10% in medium-to-high SNR scenarios. Additionally, the proposed strategies effectively reduce the input size to the Deep Learning model while preserving its performance. The computational cost, measured in floating-point operations per second (FLOPs) during inference, is reduced by 98.55% relative to state-of-the-art solutions, making it ideal for real-time applications.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
Low-Complexity CNN-Based Classification of Electroneurographic Signals
Authors:
Arek Berc Gokdag,
Silvia Mura,
Antonio Coviello,
Michele Zhu,
Maurizio Magarini,
Umberto Spagnolini
Abstract:
Peripheral nerve interfaces (PNIs) facilitate neural recording and stimulation for treating nerve injuries, but real-time classification of electroneurographic (ENG) signals remains challenging due to constraints on complexity and latency, particularly in implantable devices. This study introduces MobilESCAPE-Net, a lightweight architecture that reduces computational cost while maintaining and sli…
▽ More
Peripheral nerve interfaces (PNIs) facilitate neural recording and stimulation for treating nerve injuries, but real-time classification of electroneurographic (ENG) signals remains challenging due to constraints on complexity and latency, particularly in implantable devices. This study introduces MobilESCAPE-Net, a lightweight architecture that reduces computational cost while maintaining and slightly improving classification performance. Compared to the state-of-the-art ESCAPE-Net, MobilESCAPE-Net achieves comparable accuracy and F1-score with significantly lower complexity, reducing trainable parameters by 99.9\% and floating point operations per second by 92.47\%, enabling faster inference and real-time processing. Its efficiency makes it well-suited for low-complexity ENG signal classification in resource-constrained environments such as implantable devices.
△ Less
Submitted 27 April, 2025;
originally announced May 2025.
-
Fast-Powerformer: A Memory-Efficient Transformer for Accurate Mid-Term Wind Power Forecasting
Authors:
Mingyi Zhu,
Zhaoxin Li,
Qiao Lin,
Li Ding
Abstract:
Wind power forecasting (WPF), as a significant research topic within renewable energy, plays a crucial role in enhancing the security, stability, and economic operation of power grids. However, due to the high stochasticity of meteorological factors (e.g., wind speed) and significant fluctuations in wind power output, mid-term wind power forecasting faces a dual challenge of maintaining high accur…
▽ More
Wind power forecasting (WPF), as a significant research topic within renewable energy, plays a crucial role in enhancing the security, stability, and economic operation of power grids. However, due to the high stochasticity of meteorological factors (e.g., wind speed) and significant fluctuations in wind power output, mid-term wind power forecasting faces a dual challenge of maintaining high accuracy and computational efficiency. To address these issues, this paper proposes an efficient and lightweight mid-term wind power forecasting model, termed Fast-Powerformer. The proposed model is built upon the Reformer architecture, incorporating structural enhancements such as a lightweight Long Short-Term Memory (LSTM) embedding module, an input transposition mechanism, and a Frequency Enhanced Channel Attention Mechanism (FECAM). These improvements enable the model to strengthen temporal feature extraction, optimize dependency modeling across variables, significantly reduce computational complexity, and enhance sensitivity to periodic patterns and dominant frequency components. Experimental results conducted on multiple real-world wind farm datasets demonstrate that the proposed Fast-Powerformer achieves superior prediction accuracy and operational efficiency compared to mainstream forecasting approaches. Furthermore, the model exhibits fast inference speed and low memory consumption, highlighting its considerable practical value for real-world deployment scenarios.
△ Less
Submitted 15 April, 2025;
originally announced April 2025.
-
PathSeqSAM: Sequential Modeling for Pathology Image Segmentation with SAM2
Authors:
Mingyang Zhu,
Yinting Liu,
Mingyu Li,
Jiacheng Wang
Abstract:
Current methods for pathology image segmentation typically treat 2D slices independently, ignoring valuable cross-slice information. We present PathSeqSAM, a novel approach that treats 2D pathology slices as sequential video frames using SAM2's memory mechanisms. Our method introduces a distance-aware attention mechanism that accounts for variable physical distances between slices and employs LoRA…
▽ More
Current methods for pathology image segmentation typically treat 2D slices independently, ignoring valuable cross-slice information. We present PathSeqSAM, a novel approach that treats 2D pathology slices as sequential video frames using SAM2's memory mechanisms. Our method introduces a distance-aware attention mechanism that accounts for variable physical distances between slices and employs LoRA for domain adaptation. Evaluated on the KPI Challenge 2024 dataset for glomeruli segmentation, PathSeqSAM demonstrates improved segmentation quality, particularly in challenging cases that benefit from cross-slice context. We have publicly released our code at https://github.com/JackyyyWang/PathSeqSAM.
△ Less
Submitted 12 April, 2025;
originally announced April 2025.
-
CTI-Unet: Cascaded Threshold Integration for Improved U-Net Segmentation of Pathology Images
Authors:
Mingyang Zhu,
Yuqiu Liang,
Jiacheng Wang
Abstract:
Chronic kidney disease (CKD) is a growing global health concern, necessitating precise and efficient image analysis to aid diagnosis and treatment planning. Automated segmentation of kidney pathology images plays a central role in facilitating clinical workflows, yet conventional segmentation models often require delicate threshold tuning. This paper proposes a novel \textit{Cascaded Threshold-Int…
▽ More
Chronic kidney disease (CKD) is a growing global health concern, necessitating precise and efficient image analysis to aid diagnosis and treatment planning. Automated segmentation of kidney pathology images plays a central role in facilitating clinical workflows, yet conventional segmentation models often require delicate threshold tuning. This paper proposes a novel \textit{Cascaded Threshold-Integrated U-Net (CTI-Unet)} to overcome the limitations of single-threshold segmentation. By sequentially integrating multiple thresholded outputs, our approach can reconcile noise suppression with the preservation of finer structural details. Experiments on the challenging KPIs2024 dataset demonstrate that CTI-Unet outperforms state-of-the-art architectures such as nnU-Net, Swin-Unet, and CE-Net, offering a robust and flexible framework for kidney pathology image segmentation.
△ Less
Submitted 7 April, 2025;
originally announced April 2025.
-
Model Predictive Control with Visibility Graphs for Humanoid Path Planning and Tracking Against Adversarial Opponents
Authors:
Ruochen Hou,
Gabriel I. Fernandez,
Mingzhang Zhu,
Dennis W. Hong
Abstract:
In this paper we detail the methods used for obstacle avoidance, path planning, and trajectory tracking that helped us win the adult-sized, autonomous humanoid soccer league in RoboCup 2024. Our team was undefeated for all seated matches and scored 45 goals over 6 games, winning the championship game 6 to 1. During the competition, a major challenge for collision avoidance was the measurement nois…
▽ More
In this paper we detail the methods used for obstacle avoidance, path planning, and trajectory tracking that helped us win the adult-sized, autonomous humanoid soccer league in RoboCup 2024. Our team was undefeated for all seated matches and scored 45 goals over 6 games, winning the championship game 6 to 1. During the competition, a major challenge for collision avoidance was the measurement noise coming from bipedal locomotion and a limited field of view (FOV). Furthermore, obstacles would sporadically jump in and out of our planned trajectory. At times our estimator would place our robot inside a hard constraint. Any planner in this competition must also be be computationally efficient enough to re-plan and react in real time. This motivated our approach to trajectory generation and tracking. In many scenarios long-term and short-term planning is needed. To efficiently find a long-term general path that avoids all obstacles we developed DAVG (Dynamic Augmented Visibility Graphs). DAVG focuses on essential path planning by setting certain regions to be active based on obstacles and the desired goal pose. By augmenting the states in the graph, turning angles are considered, which is crucial for a large soccer playing robot as turning may be more costly. A trajectory is formed by linearly interpolating between discrete points generated by DAVG. A modified version of model predictive control (MPC) is used to then track this trajectory called cf-MPC (Collision-Free MPC). This ensures short-term planning. Without having to switch formulations cf-MPC takes into account the robot dynamics and collision free constraints. Without a hard switch the control input can smoothly transition in cases where the noise places our robot inside a constraint boundary. The nonlinear formulation runs at approximately 120 Hz, while the quadratic version achieves around 400 Hz.
△ Less
Submitted 29 April, 2025; v1 submitted 2 April, 2025;
originally announced April 2025.
-
SegX: Improving Interpretability of Clinical Image Diagnosis with Segmentation-based Enhancement
Authors:
Yuhao Zhang,
Mingcheng Zhu,
Zhiyao Luo
Abstract:
Deep learning-based medical image analysis faces a significant barrier due to the lack of interpretability. Conventional explainable AI (XAI) techniques, such as Grad-CAM and SHAP, often highlight regions outside clinical interests. To address this issue, we propose Segmentation-based Explanation (SegX), a plug-and-play approach that enhances interpretability by aligning the model's explanation ma…
▽ More
Deep learning-based medical image analysis faces a significant barrier due to the lack of interpretability. Conventional explainable AI (XAI) techniques, such as Grad-CAM and SHAP, often highlight regions outside clinical interests. To address this issue, we propose Segmentation-based Explanation (SegX), a plug-and-play approach that enhances interpretability by aligning the model's explanation map with clinically relevant areas leveraging the power of segmentation models. Furthermore, we introduce Segmentation-based Uncertainty Assessment (SegU), a method to quantify the uncertainty of the prediction model by measuring the 'distance' between interpretation maps and clinically significant regions. Our experiments on dermoscopic and chest X-ray datasets show that SegX improves interpretability consistently across mortalities, and the certainty score provided by SegU reliably reflects the correctness of the model's predictions. Our approach offers a model-agnostic enhancement to medical image diagnosis towards reliable and interpretable AI in clinical decision-making.
△ Less
Submitted 14 February, 2025;
originally announced February 2025.
-
Infinite Factorial Linear Dynamical Systems for Transient Signal Detection
Authors:
Jiadi Bao,
Yatong Wang,
Yunjie Li,
Mengtao Zhu,
Shafei Wang
Abstract:
Accurately detecting the transient signal of interest from the background signal is one of the fundamental tasks in signal processing. The most recent approaches assume the existence of a single background source and represent the background signal using a linear dynamical system (LDS). This assumption might fail to capture the complexities of modern electromagnetic environments with multiple sour…
▽ More
Accurately detecting the transient signal of interest from the background signal is one of the fundamental tasks in signal processing. The most recent approaches assume the existence of a single background source and represent the background signal using a linear dynamical system (LDS). This assumption might fail to capture the complexities of modern electromagnetic environments with multiple sources. To address this limitation, this paper proposes a method for detecting the transient signal in a background composed of an unknown number of emitters. The proposed method consists of two main tasks. First, a Bayesian nonparametric model called the infinite factorial linear dynamical system (IFLDS) is developed. The developed model is based on the sticky Indian buffet process and enables the representation and parameter learning of the unbounded number of background sources. This study also designs a parameter learning method for the IFLDS using slice sampling and particle Gibbs with ancestor sampling. Second, the finite moving average (FMA) stopping time is introduced to minimize the worst-case probability of missed detection, and the statistical performance of the stopping time is investigated. To facilitate the computation of the FMA stopping time, this study derives the factorial Kalman forward filtering (FKFF) method and designs a dependence structure for the underlying model, allowing the stopping time to be defined by a recursive function. Numerical simulations demonstrate the effectiveness of the proposed method and the validity of the theoretical results. The experimental results of the pulse signal detection under the condition of communication interference confirm the effectiveness and superiority of the proposed method.
△ Less
Submitted 9 January, 2025;
originally announced January 2025.
-
Dynamic High-Order Control Barrier Functions with Diffuser for Safety-Critical Trajectory Planning at Signal-Free Intersections
Authors:
Di Chen,
Ruiguo Zhong,
Kehua Chen,
Zhiwei Shang,
Meixin Zhu,
Edward Chung
Abstract:
Planning safe and efficient trajectories through signal-free intersections presents significant challenges for autonomous vehicles (AVs), particularly in dynamic, multi-task environments with unpredictable interactions and an increased possibility of conflicts. This study aims to address these challenges by developing a unified, robust, adaptive framework to ensure safety and efficiency across thr…
▽ More
Planning safe and efficient trajectories through signal-free intersections presents significant challenges for autonomous vehicles (AVs), particularly in dynamic, multi-task environments with unpredictable interactions and an increased possibility of conflicts. This study aims to address these challenges by developing a unified, robust, adaptive framework to ensure safety and efficiency across three distinct intersection movements: left-turn, right-turn, and straight-ahead. Existing methods often struggle to reliably ensure safety and effectively learn multi-task behaviors from demonstrations in such environments. This study proposes a safety-critical planning method that integrates Dynamic High-Order Control Barrier Functions (DHOCBF) with a diffusion-based model, called Dynamic Safety-Critical Diffuser (DSC-Diffuser). The DSC-Diffuser leverages task-guided planning to enhance efficiency, allowing the simultaneous learning of multiple driving tasks from real-world expert demonstrations. Moreover, the incorporation of goal-oriented constraints significantly reduces displacement errors, ensuring precise trajectory execution. To further ensure driving safety in dynamic environments, the proposed DHOCBF framework dynamically adjusts to account for the movements of surrounding vehicles, offering enhanced adaptability and reduce the conservatism compared to traditional control barrier functions. Validity evaluations of DHOCBF, conducted through numerical simulations, demonstrate its robustness in adapting to variations in obstacle velocities, sizes, uncertainties, and locations, effectively maintaining driving safety across a wide range of complex and uncertain scenarios. Comprehensive performance evaluations demonstrate that DSC-Diffuser generates realistic, stable, and generalizable policies, providing flexibility and reliable safety assurance in complex multi-task driving scenarios.
△ Less
Submitted 31 March, 2025; v1 submitted 29 November, 2024;
originally announced December 2024.
-
Non-Interrupting Rail Track Geometry Measurement System Using UAV and LiDAR
Authors:
Lihao Qiu,
Ming Zhu,
JeeWoong Park,
Yingtao Jiang,
Hualiang,
Teng
Abstract:
The safety of train operations is largely dependent on the health of rail tracks, necessitating regular and meticulous inspection and maintenance. A significant part of such inspections involves geometric measurements of the tracks to detect any potential problems. Traditional methods for track geometry measurements, while proven to be accurate, require track closures during inspections, and consu…
▽ More
The safety of train operations is largely dependent on the health of rail tracks, necessitating regular and meticulous inspection and maintenance. A significant part of such inspections involves geometric measurements of the tracks to detect any potential problems. Traditional methods for track geometry measurements, while proven to be accurate, require track closures during inspections, and consume a considerable amount of time as the inspection area grows, causing significant disruptions to regular operations. To address this challenge, this paper proposes a track geometry measurement system (TGMS) that utilizes an unmanned aerial vehicle (UAV) platform equipped with a light detection and ranging (LiDAR) sensor. Integrated with a state-of-the-art machine-learning-based computer vision algorithm, and a simultaneous localization and mapping (SLAM) algorithm, this platform can conduct rail geometry inspections seamlessly over a larger area without interrupting rail operations. In particular, this semi- or fully automated measurement is found capable of measuring critical rail geometry irregularities in gauge, curvature, and profile with sub-inch accuracy. Cross-level and warp are not measured due to the absence of gravity data. By eliminating operational interruptions, our system offers a more streamlined, cost-effective, and safer solution for inspecting and maintaining rail infrastructure.
△ Less
Submitted 25 October, 2024; v1 submitted 28 September, 2024;
originally announced October 2024.
-
Development of a Platform to Enable Real Time, Non-disruptive Testing and Early Fault Detection of Critical High Voltage Transformers and Switchgears in High Speed-rail
Authors:
Jiawei Fan,
Ming Zhu,
Yingtao Jiang,
Hualiang Teng
Abstract:
Partial discharge (PD) incidents can occur in critical components of high-speed rail electric systems, such as transformers and switchgears, due to localized insulation defects that cannot withstand electric stress, leading to potential flashovers. These incidents can escalate over time, resulting in breakdowns, downtime, and safety risks. Fortunately, PD activities emit radio frequency (RF) signa…
▽ More
Partial discharge (PD) incidents can occur in critical components of high-speed rail electric systems, such as transformers and switchgears, due to localized insulation defects that cannot withstand electric stress, leading to potential flashovers. These incidents can escalate over time, resulting in breakdowns, downtime, and safety risks. Fortunately, PD activities emit radio frequency (RF) signals, allowing for the development of a hardware platform for real-time, non-invasive PD detection and monitoring. The system uses an RF antenna and high-speed data acquisition to scan signals across a configurable frequency range (100MHz to 3GHz), utilizing intermediate frequency modulation and sliding frequency windows for detailed analysis. When signals exceed a threshold, the system records the events, capturing both raw signal data and spectrum snapshots. Real-time data is streamed to a cloud server, offering remote access through a dedicated smartphone application, enabling maintenance teams to monitor and respond promptly. Laboratory testing has confirmed the system's ability to accurately capture RF signals and provide real-time PD monitoring, enhancing the reliability and safety of high-speed rail infrastructure.
△ Less
Submitted 1 October, 2024;
originally announced October 2024.
-
Diffusion Models for Intelligent Transportation Systems: A Survey
Authors:
Mingxing Peng,
Kehua Chen,
Xusen Guo,
Qiming Zhang,
Hui Zhong,
Meixin Zhu,
Hai Yang
Abstract:
Intelligent Transportation Systems (ITS) are vital in modern traffic management and optimization, significantly enhancing traffic efficiency and safety. Recently, diffusion models have emerged as transformative tools for addressing complex challenges within ITS. In this paper, we present a comprehensive survey of diffusion models for ITS, covering both theoretical and practical aspects. First, we…
▽ More
Intelligent Transportation Systems (ITS) are vital in modern traffic management and optimization, significantly enhancing traffic efficiency and safety. Recently, diffusion models have emerged as transformative tools for addressing complex challenges within ITS. In this paper, we present a comprehensive survey of diffusion models for ITS, covering both theoretical and practical aspects. First, we introduce the theoretical foundations of diffusion models and their key variants, including conditional diffusion models and latent diffusion models, highlighting their suitability for modeling complex, multi-modal traffic data and enabling controllable generation. Second, we outline the primary challenges in ITS and the corresponding advantages of diffusion models, providing readers with a deeper understanding of the intersection between ITS and diffusion models. Third, we offer a multi-perspective investigation of current applications of diffusion models in ITS domains, including autonomous driving, traffic simulation, trajectory prediction, and traffic safety. Finally, we discuss state-of-the-art diffusion model techniques and highlight key ITS research directions that warrant further investigation. Through this structured overview, we aim to provide researchers with a comprehensive understanding of diffusion models for ITS, thereby advancing their future applications in the transportation domain.
△ Less
Submitted 8 May, 2025; v1 submitted 24 September, 2024;
originally announced September 2024.
-
Conformal Distributed Remote Inference in Sensor Networks Under Reliability and Communication Constraints
Authors:
Meiyi Zhu,
Matteo Zecchin,
Sangwoo Park,
Caili Guo,
Chunyan Feng,
Petar Popovski,
Osvaldo Simeone
Abstract:
This paper presents communication-constrained distributed conformal risk control (CD-CRC) framework, a novel decision-making framework for sensor networks under communication constraints. Targeting multi-label classification problems, such as segmentation, CD-CRC dynamically adjusts local and global thresholds used to identify significant labels with the goal of ensuring a target false negative ra…
▽ More
This paper presents communication-constrained distributed conformal risk control (CD-CRC) framework, a novel decision-making framework for sensor networks under communication constraints. Targeting multi-label classification problems, such as segmentation, CD-CRC dynamically adjusts local and global thresholds used to identify significant labels with the goal of ensuring a target false negative rate (FNR), while adhering to communication capacity limits. CD-CRC builds on online exponentiated gradient descent to estimate the relative quality of the observations of different sensors, and on online conformal risk control (CRC) as a mechanism to control local and global thresholds. CD-CRC is proved to offer deterministic worst-case performance guarantees in terms of FNR and communication overhead, while the regret performance in terms of false positive rate (FPR) is characterized as a function of the key hyperparameters. Simulation results highlight the effectiveness of CD-CRC, particularly in communication resource-constrained environments, making it a valuable tool for enhancing the performance and reliability of distributed sensor networks.
△ Less
Submitted 24 February, 2025; v1 submitted 12 September, 2024;
originally announced September 2024.
-
Multi-task SAR Image Processing via GAN-based Unsupervised Manipulation
Authors:
Xuran Hu,
Mingzhe Zhu,
Ziqiang Xu,
Zhenpeng Feng,
Ljubisa Stankovic
Abstract:
Generative Adversarial Networks (GANs) have shown tremendous potential in synthesizing a large number of realistic SAR images by learning patterns in the data distribution. Some GANs can achieve image editing by introducing latent codes, demonstrating significant promise in SAR image processing. Compared to traditional SAR image processing methods, editing based on GAN latent space control is enti…
▽ More
Generative Adversarial Networks (GANs) have shown tremendous potential in synthesizing a large number of realistic SAR images by learning patterns in the data distribution. Some GANs can achieve image editing by introducing latent codes, demonstrating significant promise in SAR image processing. Compared to traditional SAR image processing methods, editing based on GAN latent space control is entirely unsupervised, allowing image processing to be conducted without any labeled data. Additionally, the information extracted from the data is more interpretable. This paper proposes a novel SAR image processing framework called GAN-based Unsupervised Editing (GUE), aiming to address the following two issues: (1) disentangling semantic directions in the GAN latent space and finding meaningful directions; (2) establishing a comprehensive SAR image processing framework while achieving multiple image processing functions. In the implementation of GUE, we decompose the entangled semantic directions in the GAN latent space by training a carefully designed network. Moreover, we can accomplish multiple SAR image processing tasks (including despeckling, localization, auxiliary identification, and rotation editing) in a single training process without any form of supervision. Extensive experiments validate the effectiveness of the proposed method.
△ Less
Submitted 2 August, 2024;
originally announced August 2024.
-
Adaptive optical signal-to-noise ratio recovery for long-distance optical fiber transmission
Authors:
Mingwen Zhu,
Shangsu Ding,
Zhixue Li,
Song Yu,
Jianming Shang,
Bin Luo
Abstract:
In long-distance fiber optic transmission, the optic fiber link and erbium-doped fiber amplifiers can introduce excessive noise, which reduces the optical signal-to-noise ratio (OSNR). The narrow-band optical filters can be used to eliminate noise and thereby improve OSNR. However, there is a relative frequency drift between the signal and the narrow-band filter, which leads to filtered signal ins…
▽ More
In long-distance fiber optic transmission, the optic fiber link and erbium-doped fiber amplifiers can introduce excessive noise, which reduces the optical signal-to-noise ratio (OSNR). The narrow-band optical filters can be used to eliminate noise and thereby improve OSNR. However, there is a relative frequency drift between the signal and the narrow-band filter, which leads to filtered signal instability. This paper proposes an adaptive OSNR recovery scheme based on a Fabry-Perot (F-P) cavity with mode width of 6 MHz. Utilizing the comb filtering of F-P cavity, the noise around the carrier and sidebands of the signal is filtered out simultaneously. To avoid frequency mismatch, we propose a double-servo scheme to suppress relative frequency drift between the signal and the F-P cavity. We constructed a stable radio frequency transfer system based on passive phase compensation and compared our scheme with other OSNR recovery schemes based on optical filters. Compared to the schemes based on dense wavelength division multiplexing (DWDM) and Waveshaper, our scheme demonstrates an improvement in OSNR of carrier by at least 12 dB and sidebands by at least 23.5 dB. The short-term transfer stability (1 s) is improved by one order of magnitude compared to DWDM and half an order of magnitude compared to Waveshper. This scheme can be applied to the recovery of signals with low OSNR in long-distance fiber optic transmission, improving signal quaility and extending the transmission distance limit.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
Active-RIS-Aided Covert Communications in NOMA-Inspired ISAC Wireless Systems
Authors:
Miaomiao Zhu,
Pengxu Chen,
Liang Yang,
Alexandros-Apostolos A. Boulogeorgos,
Theodoros A. Tsiftsis,
Hongwu Liu
Abstract:
Non-orthogonal multiple access (NOMA)-inspired integrated sensing and communication (ISAC) facilitates spectrum sharing for radar sensing and NOMA communications, whereas facing privacy and security challenges due to open wireless propagation. In this paper, active reconfigurable intelligent surface (RIS) is employed to aid covert communications in NOMA-inspired ISAC wireless system with the aim o…
▽ More
Non-orthogonal multiple access (NOMA)-inspired integrated sensing and communication (ISAC) facilitates spectrum sharing for radar sensing and NOMA communications, whereas facing privacy and security challenges due to open wireless propagation. In this paper, active reconfigurable intelligent surface (RIS) is employed to aid covert communications in NOMA-inspired ISAC wireless system with the aim of maximizing the covert rate. Specifically, a dual-function base-station (BS) transmits the superposition signal to sense multiple targets, while achieving covert and reliable communications for a pair of NOMA covert and public users, respectively, in the presence of a warden. Two superposition transmission schemes, namely, the transmissions with dedicated sensing signal (w-DSS) and without dedicated sensing signal (w/o-DSS), are respectively considered in the formulations of the joint transmission and reflection beamforming optimization problems. Numerical results demonstrate that active-RIS-aided NOMA-ISAC system outperforms the passive-RIS-aided and without-RIS counterparts in terms of covert rate and trade-off between covert communication and sensing performance metrics. Finally, the w/o-DSS scheme, which omits the dedicated sensing signal, achieves a higher covert rate than the w-DSS scheme by allocating more transmit power for the covert transmissions, while preserving a comparable multi-target sensing performance.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Toward Real-Time Digital Twins of EM Environments: Computational Benchmark of Ray Launching Software
Authors:
Michele Zhu,
Lorenzo Cazzella,
Francesco Linsalata,
Maurizio Magarini,
Matteo Matteucci,
Umberto Spagnolini
Abstract:
Digital Twin has emerged as a promising paradigm for accurately representing wireless communication electromagnetic environments. The resulting virtual representation of reality facilitates comprehensive insights into the propagation environment, empowering multi-layer decision-making processes at the physical communication level. This paper investigates the impact of ray-based model simulation wi…
▽ More
Digital Twin has emerged as a promising paradigm for accurately representing wireless communication electromagnetic environments. The resulting virtual representation of reality facilitates comprehensive insights into the propagation environment, empowering multi-layer decision-making processes at the physical communication level. This paper investigates the impact of ray-based model simulation within real-time Digital Twins. A benchmark for ray-based propagation simulations is presented to evaluate computational time, considering two urban scenarios characterized by different mesh complexity, single and multiple wireless link configurations, and simulations with/without diffuse scattering. Exhaustive empirical analyses are performed showing the behavior of different ray-based solutions. By offering standardized simulations and scenarios, this work provides a technical benchmark for practitioners involved in the implementation of real-time Digital Twins and optimization of ray-based propagation models.
△ Less
Submitted 2 October, 2024; v1 submitted 7 June, 2024;
originally announced June 2024.
-
Representation and De-interleaving of Mixtures of Hidden Markov Processes
Authors:
Jiadi Bao,
Mengtao Zhu,
Yunjie Li,
Shafei Wang
Abstract:
De-interleaving of the mixtures of Hidden Markov Processes (HMPs) generally depends on its representation model. Existing representation models consider Markov chain mixtures rather than hidden Markov, resulting in the lack of robustness to non-ideal situations such as observation noise or missing observations. Besides, de-interleaving methods utilize a search-based strategy, which is time-consumi…
▽ More
De-interleaving of the mixtures of Hidden Markov Processes (HMPs) generally depends on its representation model. Existing representation models consider Markov chain mixtures rather than hidden Markov, resulting in the lack of robustness to non-ideal situations such as observation noise or missing observations. Besides, de-interleaving methods utilize a search-based strategy, which is time-consuming. To address these issues, this paper proposes a novel representation model and corresponding de-interleaving methods for the mixtures of HMPs. At first, a generative model for representing the mixtures of HMPs is designed. Subsequently, the de-interleaving process is formulated as a posterior inference for the generative model. Secondly, an exact inference method is developed to maximize the likelihood of the complete data, and two approximate inference methods are developed to maximize the evidence lower bound by creating tractable structures. Then, a theoretical error probability lower bound is derived using the likelihood ratio test, and the algorithms are shown to get reasonably close to the bound. Finally, simulation results demonstrate that the proposed methods are highly effective and robust for non-ideal situations, outperforming baseline methods on simulated and real-life data.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Rethinking Grant-Free Protocol in mMTC
Authors:
Minhao Zhu,
Yifei Sun,
Lizhao You,
Zhaorui Wang,
Ya-Feng Liu,
Shuguang Cui
Abstract:
This paper revisits the identity detection problem under the current grant-free protocol in massive machine-type communications (mMTC) by asking the following question: for stable identity detection performance, is it enough to permit active devices to transmit preambles without any handshaking with the base station (BS)? Specifically, in the current grant-free protocol, the BS blindly allocates a…
▽ More
This paper revisits the identity detection problem under the current grant-free protocol in massive machine-type communications (mMTC) by asking the following question: for stable identity detection performance, is it enough to permit active devices to transmit preambles without any handshaking with the base station (BS)? Specifically, in the current grant-free protocol, the BS blindly allocates a fixed length of preamble to devices for identity detection as it lacks the prior information on the number of active devices $K$. However, in practice, $K$ varies dynamically over time, resulting in degraded identity detection performance especially when $K$ is large. Consequently, the current grant-free protocol fails to ensure stable identity detection performance. To address this issue, we propose a two-stage communication protocol which consists of estimation of $K$ in Phase I and detection of identities of active devices in Phase II. The preamble length for identity detection in Phase II is dynamically allocated based on the estimated $K$ in Phase I through a table lookup manner such that the identity detection performance could always be better than a predefined threshold. In addition, we design an algorithm for estimating $K$ in Phase I, and exploit the estimated $K$ to reduce the computational complexity of the identity detector in Phase II. Numerical results demonstrate the effectiveness of the proposed two-stage communication protocol and algorithms.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Federated reinforcement learning for robot motion planning with zero-shot generalization
Authors:
Zhenyuan Yuan,
Siyuan Xu,
Minghui Zhu
Abstract:
This paper considers the problem of learning a control policy for robot motion planning with zero-shot generalization, i.e., no data collection and policy adaptation is needed when the learned policy is deployed in new environments. We develop a federated reinforcement learning framework that enables collaborative learning of multiple learners and a central server, i.e., the Cloud, without sharing…
▽ More
This paper considers the problem of learning a control policy for robot motion planning with zero-shot generalization, i.e., no data collection and policy adaptation is needed when the learned policy is deployed in new environments. We develop a federated reinforcement learning framework that enables collaborative learning of multiple learners and a central server, i.e., the Cloud, without sharing their raw data. In each iteration, each learner uploads its local control policy and the corresponding estimated normalized arrival time to the Cloud, which then computes the global optimum among the learners and broadcasts the optimal policy to the learners. Each learner then selects between its local control policy and that from the Cloud for next iteration. The proposed framework leverages on the derived zero-shot generalization guarantees on arrival time and safety. Theoretical guarantees on almost-sure convergence, almost consensus, Pareto improvement and optimality gap are also provided. Monte Carlo simulation is conducted to evaluate the proposed framework.
△ Less
Submitted 7 April, 2024; v1 submitted 19 March, 2024;
originally announced March 2024.
-
On the Impact of Uncertainty and Calibration on Likelihood-Ratio Membership Inference Attacks
Authors:
Meiyi Zhu,
Caili Guo,
Chunyan Feng,
Osvaldo Simeone
Abstract:
In a membership inference attack (MIA), an attacker exploits the overconfidence exhibited by typical machine learning models to determine whether a specific data point was used to train a target model. In this paper, we analyze the performance of the likelihood ratio attack (LiRA) within an information-theoretical framework that allows the investigation of the impact of the aleatoric uncertainty i…
▽ More
In a membership inference attack (MIA), an attacker exploits the overconfidence exhibited by typical machine learning models to determine whether a specific data point was used to train a target model. In this paper, we analyze the performance of the likelihood ratio attack (LiRA) within an information-theoretical framework that allows the investigation of the impact of the aleatoric uncertainty in the true data generation process, of the epistemic uncertainty caused by a limited training data set, and of the calibration level of the target model. We compare three different settings, in which the attacker receives decreasingly informative feedback from the target model: confidence vector (CV) disclosure, in which the output probability vector is released; true label confidence (TLC) disclosure, in which only the probability assigned to the true label is made available by the model; and decision set (DS) disclosure, in which an adaptive prediction set is produced as in conformal prediction. We derive bounds on the advantage of an MIA adversary with the aim of offering insights into the impact of uncertainty and calibration on the effectiveness of MIAs. Simulation results demonstrate that the derived analytical bounds predict well the effectiveness of MIAs.
△ Less
Submitted 8 June, 2025; v1 submitted 16 February, 2024;
originally announced February 2024.
-
A Comprehensive Approach to Diagnosing Temporomandibular Joint Diseases: AI-driven TMD Diagnostic System
Authors:
Y. Gua,
C. T. Kong,
D. D Zhangc,
Y. J Baid,
J. K. H. Tsoia,
Hua Huangc,
Y. Q. Dengc,
Y. M Zhue
Abstract:
AI-driven TMD diagnostic system uses AI segmentation method to diagnose Temporomandibular Joint Disorders (TMD). By using segmentation, three important parts: temporal bone, temporomandibular joint (TMJ) disc and the condyle can be identified. The location and the size of each segment are used as the basic information to determine if the patient has a high chance of having Temporomandibular Joint…
▽ More
AI-driven TMD diagnostic system uses AI segmentation method to diagnose Temporomandibular Joint Disorders (TMD). By using segmentation, three important parts: temporal bone, temporomandibular joint (TMJ) disc and the condyle can be identified. The location and the size of each segment are used as the basic information to determine if the patient has a high chance of having Temporomandibular Joint Disorders (TMD).
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
SAR Despeckling via Regional Denoising Diffusion Probabilistic Model
Authors:
Xuran Hu,
Ziqiang Xu,
Zhihan Chen,
Zhengpeng Feng,
Mingzhe Zhu,
LJubisa Stankovic
Abstract:
Speckle noise poses a significant challenge in maintaining the quality of synthetic aperture radar (SAR) images, so SAR despeckling techniques have drawn increasing attention. Despite the tremendous advancements of deep learning in fixed-scale SAR image despeckling, these methods still struggle to deal with large-scale SAR images. To address this problem, this paper introduces a novel despeckling…
▽ More
Speckle noise poses a significant challenge in maintaining the quality of synthetic aperture radar (SAR) images, so SAR despeckling techniques have drawn increasing attention. Despite the tremendous advancements of deep learning in fixed-scale SAR image despeckling, these methods still struggle to deal with large-scale SAR images. To address this problem, this paper introduces a novel despeckling approach termed Region Denoising Diffusion Probabilistic Model (R-DDPM) based on generative models. R-DDPM enables versatile despeckling of SAR images across various scales, accomplished within a single training session. Moreover, The artifacts in the fused SAR images can be avoided effectively with the utilization of region-guided inverse sampling. Experiments of our proposed R-DDPM on Sentinel-1 data demonstrates superior performance to existing methods.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
iPolicy: Incremental Policy Algorithms for Feedback Motion Planning
Authors:
Guoxiang Zhao,
Devesh K. Jha,
Yebin Wang,
Minghui Zhu
Abstract:
This paper presents policy-based motion planning for robotic systems. The motion planning literature has been mostly focused on open-loop trajectory planning which is followed by tracking online. In contrast, we solve the problem of path planning and controller synthesis simultaneously by solving the related feedback control problem. We present a novel incremental policy (iPolicy) algorithm for mo…
▽ More
This paper presents policy-based motion planning for robotic systems. The motion planning literature has been mostly focused on open-loop trajectory planning which is followed by tracking online. In contrast, we solve the problem of path planning and controller synthesis simultaneously by solving the related feedback control problem. We present a novel incremental policy (iPolicy) algorithm for motion planning, which integrates sampling-based methods and set-valued optimal control methods to compute feedback controllers for the robotic system. In particular, we use sampling to incrementally construct the state space of the system. Asynchronous value iterations are performed on the sampled state space to synthesize the incremental policy feedback controller. We show the convergence of the estimates to the optimal value function in continuous state space. Numerical results with various different dynamical systems (including nonholonomic systems) verify the optimality and effectiveness of iPolicy.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Class Information Guided Reconstruction for Automatic Modulation Open-Set Recognition
Authors:
Ziwei Zhang,
Mengtao Zhu,
Jiabin Liu,
Yunjie Li,
Shafei Wang
Abstract:
Automatic Modulation Recognition (AMR) is a crucial technology in the domains of radar and communications. Traditional AMR approaches assume a closed-set scenario, where unknown samples are forcibly misclassified into known classes, leading to serious consequences for situation awareness and threat assessment. To address this issue, Automatic Modulation Open-set Recognition (AMOSR) defines two tas…
▽ More
Automatic Modulation Recognition (AMR) is a crucial technology in the domains of radar and communications. Traditional AMR approaches assume a closed-set scenario, where unknown samples are forcibly misclassified into known classes, leading to serious consequences for situation awareness and threat assessment. To address this issue, Automatic Modulation Open-set Recognition (AMOSR) defines two tasks as Known Class Classification (KCC) and Unknown Class Identification (UCI). However, AMOSR faces core challenges in terms of inappropriate decision boundaries and sparse feature distributions. To overcome the aforementioned challenges, we propose a Class Information guided Reconstruction (CIR) framework, which leverages reconstruction losses to distinguish known and unknown classes. To enhance distinguishability, we design Class Conditional Vectors (CCVs) to match the latent representations extracted from input samples, achieving perfect reconstruction for known samples while yielding poor results for unknown ones. We also propose a Mutual Information (MI) loss function to ensure reliable matching, with upper and lower bounds of MI derived for tractable optimization and mathematical proofs provided. The mutually beneficial CCVs and MI facilitate the CIR attaining optimal UCI performance without compromising KCC accuracy, especially in scenarios with a higher proportion of unknown classes. Additionally, a denoising module is introduced before reconstruction, enabling the CIR to achieve a significant performance improvement at low SNRs. Experimental results on simulated and measured signals validate the effectiveness and the robustness of the proposed method.
△ Less
Submitted 14 April, 2024; v1 submitted 20 December, 2023;
originally announced December 2023.
-
In-Sensor Radio Frequency Computing for Energy-Efficient Intelligent Radar
Authors:
Yang Sui,
Minning Zhu,
Lingyi Huang,
Chung-Tse Michael Wu,
Bo Yuan
Abstract:
Radio Frequency Neural Networks (RFNNs) have demonstrated advantages in realizing intelligent applications across various domains. However, as the model size of deep neural networks rapidly increases, implementing large-scale RFNN in practice requires an extensive number of RF interferometers and consumes a substantial amount of energy. To address this challenge, we propose to utilize low-rank dec…
▽ More
Radio Frequency Neural Networks (RFNNs) have demonstrated advantages in realizing intelligent applications across various domains. However, as the model size of deep neural networks rapidly increases, implementing large-scale RFNN in practice requires an extensive number of RF interferometers and consumes a substantial amount of energy. To address this challenge, we propose to utilize low-rank decomposition to transform a large-scale RFNN into a compact RFNN while almost preserving its accuracy. Specifically, we develop a Tensor-Train RFNN (TT-RFNN) where each layer comprises a sequence of low-rank third-order tensors, leading to a notable reduction in parameter count, thereby optimizing RF interferometer utilization in comparison to the original large-scale RFNN. Additionally, considering the inherent physical errors when mapping TT-RFNN to RF device parameters in real-world deployment, from a general perspective, we construct the Robust TT-RFNN (RTT-RFNN) by incorporating a robustness solver on TT-RFNN to enhance its robustness. To adapt the RTT-RFNN to varying requirements of reshaping operations, we further provide a reconfigurable reshaping solution employing RF switch matrices. Empirical evaluations conducted on MNIST and CIFAR-10 datasets show the effectiveness of our proposed method.
△ Less
Submitted 16 December, 2023;
originally announced December 2023.
-
SA-Med2D-20M Dataset: Segment Anything in 2D Medical Imaging with 20 Million masks
Authors:
Jin Ye,
Junlong Cheng,
Jianpin Chen,
Zhongying Deng,
Tianbin Li,
Haoyu Wang,
Yanzhou Su,
Ziyan Huang,
Jilong Chen,
Lei Jiang,
Hui Sun,
Min Zhu,
Shaoting Zhang,
Junjun He,
Yu Qiao
Abstract:
Segment Anything Model (SAM) has achieved impressive results for natural image segmentation with input prompts such as points and bounding boxes. Its success largely owes to massive labeled training data. However, directly applying SAM to medical image segmentation cannot perform well because SAM lacks medical knowledge -- it does not use medical images for training. To incorporate medical knowled…
▽ More
Segment Anything Model (SAM) has achieved impressive results for natural image segmentation with input prompts such as points and bounding boxes. Its success largely owes to massive labeled training data. However, directly applying SAM to medical image segmentation cannot perform well because SAM lacks medical knowledge -- it does not use medical images for training. To incorporate medical knowledge into SAM, we introduce SA-Med2D-20M, a large-scale segmentation dataset of 2D medical images built upon numerous public and private datasets. It consists of 4.6 million 2D medical images and 19.7 million corresponding masks, covering almost the whole body and showing significant diversity. This paper describes all the datasets collected in SA-Med2D-20M and details how to process these datasets. Furthermore, comprehensive statistics of SA-Med2D-20M are presented to facilitate the better use of our dataset, which can help the researchers build medical vision foundation models or apply their models to downstream medical applications. We hope that the large scale and diversity of SA-Med2D-20M can be leveraged to develop medical artificial intelligence for enhancing diagnosis, medical image analysis, knowledge sharing, and education. The data with the redistribution license is publicly available at https://github.com/OpenGVLab/SAM-Med2D.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Contrastive Self-Supervised Learning for Spatio-Temporal Analysis of Lung Ultrasound Videos
Authors:
Li Chen,
Jonathan Rubin,
Jiahong Ouyang,
Naveen Balaraju,
Shubham Patil,
Courosh Mehanian,
Sourabh Kulhare,
Rachel Millin,
Kenton W Gregory,
Cynthia R Gregory,
Meihua Zhu,
David O Kessler,
Laurie Malia,
Almaz Dessie,
Joni Rabiner,
Di Coneybeare,
Bo Shopsin,
Andrew Hersh,
Cristian Madar,
Jeffrey Shupp,
Laura S Johnson,
Jacob Avila,
Kristin Dwyer,
Peter Weimersheimer,
Balasundar Raju
, et al. (2 additional authors not shown)
Abstract:
Self-supervised learning (SSL) methods have shown promise for medical imaging applications by learning meaningful visual representations, even when the amount of labeled data is limited. Here, we extend state-of-the-art contrastive learning SSL methods to 2D+time medical ultrasound video data by introducing a modified encoder and augmentation method capable of learning meaningful spatio-temporal r…
▽ More
Self-supervised learning (SSL) methods have shown promise for medical imaging applications by learning meaningful visual representations, even when the amount of labeled data is limited. Here, we extend state-of-the-art contrastive learning SSL methods to 2D+time medical ultrasound video data by introducing a modified encoder and augmentation method capable of learning meaningful spatio-temporal representations, without requiring constraints on the input data. We evaluate our method on the challenging clinical task of identifying lung consolidations (an important pathological feature) in ultrasound videos. Using a multi-center dataset of over 27k lung ultrasound videos acquired from over 500 patients, we show that our method can significantly improve performance on downstream localization and classification of lung consolidation. Comparisons against baseline models trained without SSL show that the proposed methods are particularly advantageous when the size of labeled training data is limited (e.g., as little as 5% of the training set).
△ Less
Submitted 14 October, 2023;
originally announced October 2023.
-
RT-SRTS: Angle-Agnostic Real-Time Simultaneous 3D Reconstruction and Tumor Segmentation from Single X-Ray Projection
Authors:
Miao Zhu,
Qiming Fu,
Bo Liu,
Mengxi Zhang,
Bojian Li,
Xiaoyan Luo,
Fugen Zhou
Abstract:
Radiotherapy is one of the primary treatment methods for tumors, but the organ movement caused by respiration limits its accuracy. Recently, 3D imaging from a single X-ray projection has received extensive attention as a promising approach to address this issue. However, current methods can only reconstruct 3D images without directly locating the tumor and are only validated for fixed-angle imagin…
▽ More
Radiotherapy is one of the primary treatment methods for tumors, but the organ movement caused by respiration limits its accuracy. Recently, 3D imaging from a single X-ray projection has received extensive attention as a promising approach to address this issue. However, current methods can only reconstruct 3D images without directly locating the tumor and are only validated for fixed-angle imaging, which fails to fully meet the requirements of motion control in radiotherapy. In this study, a novel imaging method RT-SRTS is proposed which integrates 3D imaging and tumor segmentation into one network based on multi-task learning (MTL) and achieves real-time simultaneous 3D reconstruction and tumor segmentation from a single X-ray projection at any angle. Furthermore, the attention enhanced calibrator (AEC) and uncertain-region elaboration (URE) modules have been proposed to aid feature extraction and improve segmentation accuracy. The proposed method was evaluated on fifteen patient cases and compared with three state-of-the-art methods. It not only delivers superior 3D reconstruction but also demonstrates commendable tumor segmentation results. Simultaneous reconstruction and segmentation can be completed in approximately 70 ms, significantly faster than the required time threshold for real-time tumor tracking. The efficacies of both AEC and URE have also been validated in ablation studies. The code of work is available at https://github.com/ZywooSimple/RT-SRTS.
△ Less
Submitted 28 March, 2024; v1 submitted 12 October, 2023;
originally announced October 2023.
-
Joint Data Collection and Sensor Positioning in Multi-UAV-Assisted Wireless Sensor Network
Authors:
Mingyue Zhu,
Zhiqing Wei,
Chen Qiu,
Wangjun Jiang,
Huici Wu,
Zhiying Feng
Abstract:
Due to the high mobility and easy deployment, unmanned aerial vehicles (UAVs) have attracted much attention in the field of wireless communication and positioning. To meet the challenges of lack of infrastructure coverage, uncertain sensor position and large amount of sensing data collection in wireless sensor network (WSN), this paper presents an efficient joint data collection and sensor positio…
▽ More
Due to the high mobility and easy deployment, unmanned aerial vehicles (UAVs) have attracted much attention in the field of wireless communication and positioning. To meet the challenges of lack of infrastructure coverage, uncertain sensor position and large amount of sensing data collection in wireless sensor network (WSN), this paper presents an efficient joint data collection and sensor positioning scheme for WSN supported by multiple UAVs. Specifically, a UAV is set as the main UAV to collect data, and other UAVs are used as auxiliary UAVs for sensor positioning using time difference of arrival (TDoA). A mixed-integer non-convex optimization problem with uncertain sensor position is established. The goal is to minimize the average positioning error of all sensors by jointly optimizing the UAV trajectories, sensor transmission schedule and positioning observation points (POPs). To solve this optimization model, the original problem is decomposed into two sub-problems based on the path discrete method. Firstly, the block coordinate descent (BCD) and successive convex approximation (SCA) techniques are applied to iteratively optimize the trajectory of the main UAV and the sensor transmission schedule, so as to maximize the minimum amount of data uploaded by the sensor. Then, based on the trajectory of the main UAV, a particle swarm optimization (PSO)-based algorithm is designed to optimize the POPs of UAVs. Finally, the spline curve is applied to generate the trajectories of auxiliary UAVs. The simulation results show that the proposed scheme can meet the requirements of data collection and has a good positioning performance.
△ Less
Submitted 13 August, 2023;
originally announced August 2023.
-
Weakly Semi-Supervised Detection in Lung Ultrasound Videos
Authors:
Jiahong Ouyang,
Li Chen,
Gary Y. Li,
Naveen Balaraju,
Shubham Patil,
Courosh Mehanian,
Sourabh Kulhare,
Rachel Millin,
Kenton W. Gregory,
Cynthia R. Gregory,
Meihua Zhu,
David O. Kessler,
Laurie Malia,
Almaz Dessie,
Joni Rabiner,
Di Coneybeare,
Bo Shopsin,
Andrew Hersh,
Cristian Madar,
Jeffrey Shupp,
Laura S. Johnson,
Jacob Avila,
Kristin Dwyer,
Peter Weimersheimer,
Balasundar Raju
, et al. (2 additional authors not shown)
Abstract:
Frame-by-frame annotation of bounding boxes by clinical experts is often required to train fully supervised object detection models on medical video data. We propose a method for improving object detection in medical videos through weak supervision from video-level labels. More concretely, we aggregate individual detection predictions into video-level predictions and extend a teacher-student train…
▽ More
Frame-by-frame annotation of bounding boxes by clinical experts is often required to train fully supervised object detection models on medical video data. We propose a method for improving object detection in medical videos through weak supervision from video-level labels. More concretely, we aggregate individual detection predictions into video-level predictions and extend a teacher-student training strategy to provide additional supervision via a video-level loss. We also introduce improvements to the underlying teacher-student framework, including methods to improve the quality of pseudo-labels based on weak supervision and adaptive schemes to optimize knowledge transfer between the student and teacher networks. We apply this approach to the clinically important task of detecting lung consolidations (seen in respiratory infections such as COVID-19 pneumonia) in medical ultrasound videos. Experiments reveal that our framework improves detection accuracy and robustness compared to baseline semi-supervised models, and improves efficiency in data and annotation usage.
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
SegNetr: Rethinking the local-global interactions and skip connections in U-shaped networks
Authors:
Junlong Cheng,
Chengrui Gao,
Fengjie Wang,
Min Zhu
Abstract:
Recently, U-shaped networks have dominated the field of medical image segmentation due to their simple and easily tuned structure. However, existing U-shaped segmentation networks: 1) mostly focus on designing complex self-attention modules to compensate for the lack of long-term dependence based on convolution operation, which increases the overall number of parameters and computational complexit…
▽ More
Recently, U-shaped networks have dominated the field of medical image segmentation due to their simple and easily tuned structure. However, existing U-shaped segmentation networks: 1) mostly focus on designing complex self-attention modules to compensate for the lack of long-term dependence based on convolution operation, which increases the overall number of parameters and computational complexity of the network; 2) simply fuse the features of encoder and decoder, ignoring the connection between their spatial locations. In this paper, we rethink the above problem and build a lightweight medical image segmentation network, called SegNetr. Specifically, we introduce a novel SegNetr block that can perform local-global interactions dynamically at any stage and with only linear complexity. At the same time, we design a general information retention skip connection (IRSC) to preserve the spatial location information of encoder features and achieve accurate fusion with the decoder features. We validate the effectiveness of SegNetr on four mainstream medical image segmentation datasets, with 59\% and 76\% fewer parameters and GFLOPs than vanilla U-Net, while achieving segmentation performance comparable to state-of-the-art methods. Notably, the components proposed in this paper can be applied to other U-shaped networks to improve their segmentation performance.
△ Less
Submitted 21 July, 2023; v1 submitted 6 July, 2023;
originally announced July 2023.
-
Multi-Robot-Guided Crowd Evacuation: Two-Scale Modeling and Control
Authors:
Tongjia Zheng,
Zhenyuan Yuan,
Mollik Nayyar,
Alan R. Wagner,
Minghui Zhu,
Hai Lin
Abstract:
Emergency evacuation describes a complex situation involving time-critical decision-making by evacuees. Mobile robots are being actively explored as a potential solution to provide timely guidance. In this work, we study a robot-guided crowd evacuation problem where a small group of robots is used to guide a large human crowd to safe locations. The challenge lies in how to use micro-level human-ro…
▽ More
Emergency evacuation describes a complex situation involving time-critical decision-making by evacuees. Mobile robots are being actively explored as a potential solution to provide timely guidance. In this work, we study a robot-guided crowd evacuation problem where a small group of robots is used to guide a large human crowd to safe locations. The challenge lies in how to use micro-level human-robot interactions to indirectly influence a population that significantly outnumbers the robots to achieve the collective evacuation objective. To address the challenge, we follow a two-scale modeling strategy and explore hydrodynamic models, which consist of a family of microscopic social force models that describe how human movements are locally affected by other humans, the environment, and robots, and associated macroscopic equations for the temporal and spatial evolution of the crowd density and flow velocity. We design controllers for the robots such that they not only automatically explore the environment (with unknown dynamic obstacles) to cover it as much as possible, but also dynamically adjust the directions of their local navigation force fields based on the real-time macrostates of the crowd to guide the crowd to a safe location. We prove the stability of the proposed evacuation algorithm and conduct extensive simulations to investigate the performance of the algorithm with different combinations of human numbers, robot numbers, and obstacle settings.
△ Less
Submitted 11 January, 2024; v1 submitted 28 February, 2023;
originally announced February 2023.
-
Bayesian Non-parametric Hidden Markov Model for Agile Radar Pulse Sequences Streaming Analysis
Authors:
Jiadi Bao,
Yunjie Li,
Mengtao Zhu,
Shafei Wang
Abstract:
Multi-function radars (MFRs) are sophisticated types of sensors with the capabilities of complex agile inter-pulse modulation implementation and dynamic work mode scheduling. The developments in MFRs pose great challenges to modern electronic reconnaissance systems or radar warning receivers for recognition and inference of MFR work modes. To address this issue, this paper proposes an online proce…
▽ More
Multi-function radars (MFRs) are sophisticated types of sensors with the capabilities of complex agile inter-pulse modulation implementation and dynamic work mode scheduling. The developments in MFRs pose great challenges to modern electronic reconnaissance systems or radar warning receivers for recognition and inference of MFR work modes. To address this issue, this paper proposes an online processing framework for parameter estimation and change point detection of MFR work modes. At first, this paper designed a fully-conjugate Bayesian non-parametric hidden Markov model with a designed prior distribution (agile BNP-HMM) to represent the MFR pulse agility characteristics. The proposed model allows fully-variational Bayesian inference. Then, the proposed framework is constructed by two main parts. The first part is the agile BNP-HMM model for automatically inferring the number of HMM hidden states and emission distribution of the corresponding hidden states. An estimation error lower bound on performance is derived and the proposed algorithm is shown to be close to the bound. The second part utilizes the streaming Bayesian updating to facilitate computation, and designed an online work mode change detection framework based upon a weighted sequential probability ratio test. We demonstrate that the proposed framework is consistently highly effective and robust to baseline methods on diverse simulated data-sets.
△ Less
Submitted 22 August, 2023; v1 submitted 8 February, 2023;
originally announced February 2023.
-
A deep local attention network for pre-operative lymph node metastasis prediction in pancreatic cancer via multiphase CT imaging
Authors:
Zhilin Zheng,
Xu Fang,
Jiawen Yao,
Mengmeng Zhu,
Le Lu,
Lingyun Huang,
Jing Xiao,
Yu Shi,
Hong Lu,
Jianping Lu,
Ling Zhang,
Chengwei Shao,
Yun Bian
Abstract:
Lymph node (LN) metastasis status is one of the most critical prognostic and cancer staging factors for patients with resectable pancreatic ductal adenocarcinoma (PDAC), or in general, for any types of solid malignant tumors. Preoperative prediction of LN metastasis from non-invasive CT imaging is highly desired, as it might be straightforwardly used to guide the following neoadjuvant treatment de…
▽ More
Lymph node (LN) metastasis status is one of the most critical prognostic and cancer staging factors for patients with resectable pancreatic ductal adenocarcinoma (PDAC), or in general, for any types of solid malignant tumors. Preoperative prediction of LN metastasis from non-invasive CT imaging is highly desired, as it might be straightforwardly used to guide the following neoadjuvant treatment decision and surgical planning. Most studies only capture the tumor characteristics in CT imaging to implicitly infer LN metastasis and very few work exploit direct LN's CT imaging information. To the best of our knowledge, this is the first work to propose a fully-automated LN segmentation and identification network to directly facilitate the LN metastasis status prediction task. Nevertheless LN segmentation/detection is very challenging since LN can be easily confused with other hard negative anatomic structures (e.g., vessels) from radiological images. We explore the anatomical spatial context priors of pancreatic LN locations by generating a guiding attention map from related organs and vessels to assist segmentation and infer LN status. As such, LN segmentation is impelled to focus on regions that are anatomically adjacent or plausible with respect to the specific organs and vessels. The metastasized LN identification network is trained to classify the segmented LN instances into positives or negatives by reusing the segmentation network as a pre-trained backbone and padding a new classification head. More importantly, we develop a LN metastasis status prediction network that combines the patient-wise aggregation results of LN segmentation/identification and deep imaging features extracted from the tumor region. Extensive quantitative nested five-fold cross-validation is conducted on a discovery dataset of 749 patients with PDAC.
△ Less
Submitted 4 January, 2023;
originally announced January 2023.
-
High-Quality Real-Time Rendering Using Subpixel Sampling Reconstruction
Authors:
Boyu Zhang,
Hongliang Yuan,
Mingyan Zhu,
Ligang Liu,
Jue Wang
Abstract:
Generating high-quality, realistic rendering images for real-time applications generally requires tracing a few samples-per-pixel (spp) and using deep learning-based approaches to denoise the resulting low-spp images. Existing denoising methods have yet to achieve real-time performance at high resolutions due to the physically-based sampling and network inference time costs. In this paper, we prop…
▽ More
Generating high-quality, realistic rendering images for real-time applications generally requires tracing a few samples-per-pixel (spp) and using deep learning-based approaches to denoise the resulting low-spp images. Existing denoising methods have yet to achieve real-time performance at high resolutions due to the physically-based sampling and network inference time costs. In this paper, we propose a novel Monte Carlo sampling strategy to accelerate the sampling process and a corresponding denoiser, subpixel sampling reconstruction (SSR), to obtain high-quality images. Extensive experiments demonstrate that our method significantly outperforms previous approaches in denoising quality and reduces overall time costs, enabling real-time rendering capabilities at 2K resolution.
△ Less
Submitted 25 June, 2023; v1 submitted 3 January, 2023;
originally announced January 2023.
-
Information Bottleneck-Inspired Type Based Multiple Access for Remote Estimation in IoT Systems
Authors:
Meiyi Zhu,
Chunyan Feng,
Caili Guo,
Nan Jiang,
Osvaldo Simeone
Abstract:
Type-based multiple access (TBMA) is a semantics-aware multiple access protocol for remote inference. In TBMA, codewords are reused across transmitting sensors, with each codeword being assigned to a different observation value. Existing TBMA protocols are based on fixed shared codebooks and on conventional maximum-likelihood or Bayesian decoders, which require knowledge of the distributions of ob…
▽ More
Type-based multiple access (TBMA) is a semantics-aware multiple access protocol for remote inference. In TBMA, codewords are reused across transmitting sensors, with each codeword being assigned to a different observation value. Existing TBMA protocols are based on fixed shared codebooks and on conventional maximum-likelihood or Bayesian decoders, which require knowledge of the distributions of observations and channels. In this letter, we propose a novel design principle for TBMA based on the information bottleneck (IB). In the proposed IB-TBMA protocol, the shared codebook is jointly optimized with a decoder based on artificial neural networks (ANNs), so as to adapt to source, observations, and channel statistics based on data only. We also introduce the Compressed IB-TBMA (CIB-TBMA) protocol, which improves IB-TBMA by enabling a reduction in the number of codewords via an IB-inspired clustering phase. Numerical results demonstrate the importance of a joint design of codebook and neural decoder, and validate the benefits of codebook compression.
△ Less
Submitted 5 April, 2023; v1 submitted 19 December, 2022;
originally announced December 2022.
-
iNavFIter-M: Matrix Formulation of Functional Iteration for Inertial Navigation Computation
Authors:
Hongyan Jiang,
Maoran Zhu,
Yanyan Fu,
Yuanxin Wu
Abstract:
The acquisition of attitude, velocity, and position is an essential task in the field of inertial navigation, achieved by integrating the measurements from inertial sensors. Recently, the ultra-precision inertial navigation computation has been tackled by the functional iteration approach (iNavFIter) that drives the non-commutativity errors almost to the computer truncation error level. This paper…
▽ More
The acquisition of attitude, velocity, and position is an essential task in the field of inertial navigation, achieved by integrating the measurements from inertial sensors. Recently, the ultra-precision inertial navigation computation has been tackled by the functional iteration approach (iNavFIter) that drives the non-commutativity errors almost to the computer truncation error level. This paper proposes a computationally efficient matrix formulation of the functional iteration approach, named the iNavFIter-M. The Chebyshev polynomial coefficients in two consecutive iterations are explicitly connected through the matrix formulation, in contrast to the implicit iterative relationship in the original iNavFIter. By so doing, it allows a straightforward algorithmic implementation and a number of matrix factors can be pre-calculated for more efficient computation. Numerical results demonstrate that the proposed iNavFIter-M algorithm is able to achieve the same high computation accuracy as the original iNavFIter does, at the computational cost comparable to the typical two-sample algorithm. The iNavFIter-M algorithm is also implemented on a FPGA board to demonstrate its potential in real time applications.
△ Less
Submitted 16 November, 2022;
originally announced November 2022.
-
Learning Critical Scenarios in Feedback Control Systems for Automated Driving
Authors:
Mengjia Zhu,
Alberto Bemporad,
Maximilian Kneissl,
Hasan Esen
Abstract:
Testing is essential for verifying and validating control designs, especially in safety-critical applications. In particular, the control system governing an automated driving vehicle must be proven reliable enough for its acceptance on the market. Recently, much research has focused on scenario-based methods. However, the number of possible driving scenarios to test is in principle infinite. In t…
▽ More
Testing is essential for verifying and validating control designs, especially in safety-critical applications. In particular, the control system governing an automated driving vehicle must be proven reliable enough for its acceptance on the market. Recently, much research has focused on scenario-based methods. However, the number of possible driving scenarios to test is in principle infinite. In this paper, we formalize a learning-based optimization framework to generate corner test-cases, where we take into account the operational design domain. We examine the approach on the case of a feedback control system for automated driving, for which we suggest the design of the objective function expressing the criticality of scenarios. Numerical tests on two logical scenarios of the case study demonstrate that the approach can identify critical scenarios within a limited number of closed-loop experiments.
△ Less
Submitted 8 September, 2023; v1 submitted 26 September, 2022;
originally announced September 2022.
-
Enhanced Effective Aperture Distribution Function for Characterizing Large-Scale Antenna Arrays
Authors:
Xuesong Cai,
Meifang Zhu,
Aleksei Fedorov,
Fredrik Tufvesson
Abstract:
Accurate characterization of large-scale antenna arrays is growing in importance and complexity for the fifth-generation (5G) and beyond systems, as they feature more antenna elements and require increased overall performance. The full 3D patterns of all antenna elements in the array need to be characterized because they are in general different due to construction inaccuracy, coupling, antenna ar…
▽ More
Accurate characterization of large-scale antenna arrays is growing in importance and complexity for the fifth-generation (5G) and beyond systems, as they feature more antenna elements and require increased overall performance. The full 3D patterns of all antenna elements in the array need to be characterized because they are in general different due to construction inaccuracy, coupling, antenna array's asymmetry, etc. The effective aperture distribution function (EADF) can provide an analytic description of an antenna array based on a full-sphere measurement of the array in an anechoic chamber. However, as the array aperture increases, denser spatial samples are needed for EADF due to large distance offsets of array elements from the reference point in the anechoic chamber, leading to a prohibitive measurement time and increased complexity of EADF. In this paper, we present the EADF applied to large-scale arrays and highlight issues caused by the large array aperture. To overcome the issues, an enhanced EADF is proposed with a low complexity that is intrinsically determined by the characteristic of each array element rather than the array aperture. The enhanced EADF is validated using experimental measurements conducted at 27-30 GHz frequency band with a relatively large planar array.
△ Less
Submitted 7 June, 2023; v1 submitted 23 September, 2022;
originally announced September 2022.
-
Multi-Robot-Assisted Human Crowd Evacuation using Navigation Velocity Fields
Authors:
Tongjia Zheng,
Zhenyuan Yuan,
Mollik Nayyar,
Alan R. Wagner,
Minghui Zhu,
Hai Lin
Abstract:
This work studies a robot-assisted crowd evacuation problem where we control a small group of robots to guide a large human crowd to safe locations. The challenge lies in how to model human-robot interactions and design robot controls to indirectly control a human population that significantly outnumbers the robots. To address the challenge, we treat the crowd as a continuum and formulate the evac…
▽ More
This work studies a robot-assisted crowd evacuation problem where we control a small group of robots to guide a large human crowd to safe locations. The challenge lies in how to model human-robot interactions and design robot controls to indirectly control a human population that significantly outnumbers the robots. To address the challenge, we treat the crowd as a continuum and formulate the evacuation objective as driving the crowd density to target locations. We propose a novel mean-field model which consists of a family of microscopic equations that explicitly model how human motions are locally guided by the robots and an associated macroscopic equation that describes how the crowd density is controlled by the navigation velocity fields generated by all robots. Then, we design density feedback controllers for the robots to dynamically adjust their states such that the generated navigation velocity fields drive the crowd density to a target density. Stability guarantees of the proposed controllers are proven. Agent-based simulations are included to evaluate the proposed evacuation algorithms.
△ Less
Submitted 20 September, 2022;
originally announced September 2022.
-
Distributed Safe Learning and Planning for Multi-robot Systems
Authors:
Zhenyuan Yuan,
Minghui Zhu
Abstract:
This paper considers the problem of online multi-robot motion planning with general nonlinear dynamics subject to unknown external disturbances. We propose dSLAP, a distributed safe learning and planning framework that allows the robots to safely navigate through the environments by coupling online learning and motion planning. Gaussian process regression is used to online learn the disturbances w…
▽ More
This paper considers the problem of online multi-robot motion planning with general nonlinear dynamics subject to unknown external disturbances. We propose dSLAP, a distributed safe learning and planning framework that allows the robots to safely navigate through the environments by coupling online learning and motion planning. Gaussian process regression is used to online learn the disturbances with uncertainty quantification. The planning algorithm ensures collision avoidance against the learning uncertainty and utilizes set-valued analysis to achieve fast adaptation in response to the newly learned models. A set-valued model predictive control problem is then formulated and solved to return a control policy that balances between actively exploring the unknown disturbances and reaching goal regions. Sufficient conditions are established to guarantee the safety of the robots in the absence of backup policy. Monte Carlo simulations are conducted for evaluation.
△ Less
Submitted 25 May, 2025; v1 submitted 15 July, 2022;
originally announced July 2022.
-
Real-time Dual-channel 2 * 2 MIMO Fiber-THz-Fiber Seamless Integration System at 385 GHz and 435 GHz
Authors:
Jiao Zhang,
Min Zhu,
Bingchang Hua,
Mingzheng Lei,
Yuancheng Cai,
Liang Tian,
Yucong Zou,
Like Ma,
Yongming Huang,
Jianjun Yu,
Xiaohu You
Abstract:
We demonstrate the first practical real-time dual-channel fiber-THz-fiber 2 * 2 MIMO seamless integration system with a record net data rate of 2 * 103.125 Gb/s at 385 GHz and 435 GHz over two spans of 20 km SSMF and 3 m wireless link.
We demonstrate the first practical real-time dual-channel fiber-THz-fiber 2 * 2 MIMO seamless integration system with a record net data rate of 2 * 103.125 Gb/s at 385 GHz and 435 GHz over two spans of 20 km SSMF and 3 m wireless link.
△ Less
Submitted 24 June, 2022;
originally announced June 2022.
-
Analytical Interpretation of Latent Codes in InfoGAN with SAR Images
Authors:
Zhenpeng Feng,
Milos Dakovic,
Hongbing Ji,
Mingzhe Zhu,
Ljubisa Stankovic
Abstract:
Generative Adversarial Networks (GANs) can synthesize abundant photo-realistic synthetic aperture radar (SAR) images. Some recent GANs (e.g., InfoGAN), are even able to edit specific properties of the synthesized images by introducing latent codes. It is crucial for SAR image synthesis since the targets in real SAR images are with different properties due to the imaging mechanism. Despite the succ…
▽ More
Generative Adversarial Networks (GANs) can synthesize abundant photo-realistic synthetic aperture radar (SAR) images. Some recent GANs (e.g., InfoGAN), are even able to edit specific properties of the synthesized images by introducing latent codes. It is crucial for SAR image synthesis since the targets in real SAR images are with different properties due to the imaging mechanism. Despite the success of InfoGAN in manipulating properties, there still lacks a clear explanation of how these latent codes affect synthesized properties, thus editing specific properties usually relies on empirical trials, unreliable and time-consuming. In this paper, we show that latent codes are disentangled to affect the properties of SAR images in a non-linear manner. By introducing some property estimators for latent codes, we are able to provide a completely analytical nonlinear model to decompose the entangled causality between latent codes and different properties. The qualitative and quantitative experimental results further reveal that the properties can be calculated by latent codes, inversely, the satisfying latent codes can be estimated given desired properties. In this case, properties can be manipulated by latent codes as we expect.
△ Less
Submitted 26 May, 2022;
originally announced May 2022.
-
Splicing Detection and Localization In Satellite Imagery Using Conditional GANs
Authors:
Emily R. Bartusiak,
Sri Kalyan Yarlagadda,
David Güera,
Paolo Bestagini,
Stefano Tubaro,
Fengqing M. Zhu,
Edward J. Delp
Abstract:
The widespread availability of image editing tools and improvements in image processing techniques allow image manipulation to be very easy. Oftentimes, easy-to-use yet sophisticated image manipulation tools yields distortions/changes imperceptible to the human observer. Distribution of forged images can have drastic ramifications, especially when coupled with the speed and vastness of the Interne…
▽ More
The widespread availability of image editing tools and improvements in image processing techniques allow image manipulation to be very easy. Oftentimes, easy-to-use yet sophisticated image manipulation tools yields distortions/changes imperceptible to the human observer. Distribution of forged images can have drastic ramifications, especially when coupled with the speed and vastness of the Internet. Therefore, verifying image integrity poses an immense and important challenge to the digital forensic community. Satellite images specifically can be modified in a number of ways, including the insertion of objects to hide existing scenes and structures. In this paper, we describe the use of a Conditional Generative Adversarial Network (cGAN) to identify the presence of such spliced forgeries within satellite images. Additionally, we identify their locations and shapes. Trained on pristine and falsified images, our method achieves high success on these detection and localization objectives.
△ Less
Submitted 3 May, 2022;
originally announced May 2022.
-
A Formal Safety Characterization of Advanced Driver Assist Systems in the Car-Following Regime with Scenario-Sampling
Authors:
Bowen Weng,
Minghao Zhu,
Keith Redmill
Abstract:
The capability to follow a lead-vehicle and avoid rear-end collisions is one of the most important functionalities for human drivers and various Advanced Driver Assist Systems (ADAS). Existing safety performance justification of the car-following systems either relies on simple concrete scenarios with biased surrogate metrics or requires a significantly long driving distance for risk observation a…
▽ More
The capability to follow a lead-vehicle and avoid rear-end collisions is one of the most important functionalities for human drivers and various Advanced Driver Assist Systems (ADAS). Existing safety performance justification of the car-following systems either relies on simple concrete scenarios with biased surrogate metrics or requires a significantly long driving distance for risk observation and inference. In this paper, we propose a guaranteed unbiased and sampling efficient scenario-based safety evaluation framework inspired by the previous work on $εδ$-almost safe set quantification. The proposal characterizes the complete safety performance of the test subject in the car-following regime. The performance of the proposed method is also demonstrated in challenging cases including some widely adopted car-following decision-making modules and the commercially available Openpilot driving stack by CommaAI.
△ Less
Submitted 23 May, 2022; v1 submitted 17 February, 2022;
originally announced February 2022.
-
Beamspace Multidimensional ESPRIT Approaches for Simultaneous Localization and Communications
Authors:
Fan Jiang,
Fuxi Wen,
Yu Ge,
Meifang Zhu,
Henk Wymeersch,
Fredrik Tufvesson
Abstract:
Modern wireless communication systems operating at high carrier frequencies are characterized by a high dimensionality of the underlying parameter space (including channel gains, angles, delays, and possibly Doppler shifts). Estimating these parameters is valuable for communication purposes, but also for localization and sensing, making channel estimation a critical component in any joint communic…
▽ More
Modern wireless communication systems operating at high carrier frequencies are characterized by a high dimensionality of the underlying parameter space (including channel gains, angles, delays, and possibly Doppler shifts). Estimating these parameters is valuable for communication purposes, but also for localization and sensing, making channel estimation a critical component in any joint communication and localization or sensing application. The high dimensionality make it difficult to use search-based methods such as maximum likelihood. Search-free methods such as ESPRIT provide an attractive alternative, but require a complex decomposition step in both the tensor and matrix version of ESPRIT. To mitigate this, we propose, develop, and analyze a reduced complexity beamspace ESPRIT method. Complexity is reduced both by beampace processing as well as low-complex implementation of the singular value decomposition. A novel perturbation analysis provides important insights for both channel estimation and localization performance. The proposed method is compared to the tensor ESPRIT method, in terms of channel estimation, communication, localization, and sensing performance, further validating the perturbation analysis.
△ Less
Submitted 14 November, 2021;
originally announced November 2021.
-
Stain-free Detection of Embryo Polarization using Deep Learning
Authors:
Cheng Shen,
Adiyant Lamba,
Meng Zhu,
Ray Zhang,
Changhuei Yang,
Magdalena Zernicka Goetz
Abstract:
Polarization of the mammalian embryo at the right developmental time is critical for its development to term and would be valuable in assessing the potential of human embryos. However, tracking polarization requires invasive fluorescence staining, impermissible in the in vitro fertilization clinic. Here, we report the use of artificial intelligence to detect polarization from unstained time-lapse…
▽ More
Polarization of the mammalian embryo at the right developmental time is critical for its development to term and would be valuable in assessing the potential of human embryos. However, tracking polarization requires invasive fluorescence staining, impermissible in the in vitro fertilization clinic. Here, we report the use of artificial intelligence to detect polarization from unstained time-lapse movies of mouse embryos. We assembled a dataset of bright-field movie frames from 8-cell-stage embryos, side-by-side with corresponding images of fluorescent markers of cell polarization. We then used an ensemble learning model to detect whether any bright-field frame showed an embryo before or after onset of polarization. Our resulting model has an accuracy of 85% for detecting polarization, significantly outperforming human volunteers trained on the same data (61% accuracy). We discovered that our self-learning model focuses upon the angle between cells as one known cue for compaction, which precedes polarization, but it outperforms the use of this cue alone. By compressing three-dimensional time-lapsed image data into two-dimensions, we are able to reduce data to an easily manageable size for deep learning processing. In conclusion, we describe a method for detecting a key developmental feature of embryo development that avoids clinically impermissible fluorescence staining.
△ Less
Submitted 8 November, 2021;
originally announced November 2021.