-
Out-of-Distribution Semantic Occupancy Prediction
Authors:
Yuheng Zhang,
Mengfei Duan,
Kunyu Peng,
Yuhang Wang,
Ruiping Liu,
Fei Teng,
Kai Luo,
Zhiyong Li,
Kailun Yang
Abstract:
3D Semantic Occupancy Prediction is crucial for autonomous driving, providing a dense, semantically rich environmental representation. However, existing methods focus on in-distribution scenes, making them susceptible to Out-of-Distribution (OoD) objects and long-tail distributions, which increases the risk of undetected anomalies and misinterpretations, posing safety hazards. To address these cha…
▽ More
3D Semantic Occupancy Prediction is crucial for autonomous driving, providing a dense, semantically rich environmental representation. However, existing methods focus on in-distribution scenes, making them susceptible to Out-of-Distribution (OoD) objects and long-tail distributions, which increases the risk of undetected anomalies and misinterpretations, posing safety hazards. To address these challenges, we introduce Out-of-Distribution Semantic Occupancy Prediction, targeting OoD detection in 3D voxel space. To fill the gaps in the dataset, we propose a Synthetic Anomaly Integration Pipeline that injects synthetic anomalies while preserving realistic spatial and occlusion patterns, enabling the creation of two datasets: VAA-KITTI and VAA-KITTI-360. We introduce OccOoD, a novel framework integrating OoD detection into 3D semantic occupancy prediction, with Voxel-BEV Progressive Fusion (VBPF) leveraging an RWKV-based branch to enhance OoD detection via geometry-semantic fusion. Experimental results demonstrate that OccOoD achieves state-of-the-art OoD detection with an AuROC of 67.34% and an AuPRCr of 29.21% within a 1.2m region, while maintaining competitive occupancy prediction performance. The established datasets and source code will be made publicly available at https://github.com/7uHeng/OccOoD.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Panoramic Out-of-Distribution Segmentation
Authors:
Mengfei Duan,
Kailun Yang,
Yuheng Zhang,
Yihong Cao,
Fei Teng,
Kai Luo,
Jiaming Zhang,
Zhiyong Li,
Shutao Li
Abstract:
Panoramic imaging enables capturing 360° images with an ultra-wide Field-of-View (FoV) for dense omnidirectional perception. However, current panoramic semantic segmentation methods fail to identify outliers, and pinhole Out-of-distribution Segmentation (OoS) models perform unsatisfactorily in the panoramic domain due to background clutter and pixel distortions. To address these issues, we introdu…
▽ More
Panoramic imaging enables capturing 360° images with an ultra-wide Field-of-View (FoV) for dense omnidirectional perception. However, current panoramic semantic segmentation methods fail to identify outliers, and pinhole Out-of-distribution Segmentation (OoS) models perform unsatisfactorily in the panoramic domain due to background clutter and pixel distortions. To address these issues, we introduce a new task, Panoramic Out-of-distribution Segmentation (PanOoS), achieving OoS for panoramas. Furthermore, we propose the first solution, POS, which adapts to the characteristics of panoramic images through text-guided prompt distribution learning. Specifically, POS integrates a disentanglement strategy designed to materialize the cross-domain generalization capability of CLIP. The proposed Prompt-based Restoration Attention (PRA) optimizes semantic decoding by prompt guidance and self-adaptive correction, while Bilevel Prompt Distribution Learning (BPDL) refines the manifold of per-pixel mask embeddings via semantic prototype supervision. Besides, to compensate for the scarcity of PanOoS datasets, we establish two benchmarks: DenseOoS, which features diverse outliers in complex environments, and QuadOoS, captured by a quadruped robot with a panoramic annular lens system. Extensive experiments demonstrate superior performance of POS, with AuPRC improving by 34.25% and FPR95 decreasing by 21.42% on DenseOoS, outperforming state-of-the-art pinhole-OoS methods. Moreover, POS achieves leading closed-set segmentation capabilities. Code and datasets will be available at https://github.com/MengfeiD/PanOoS.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
Channel Estimation and Hybrid Precoding for Massive MIMO-OTFS System With Doubly Squint
Authors:
Mingming Duan,
Pengfei Zhang,
Shun Zhang,
Yao Ge,
Octavia A. Dobre,
Chau Yuen
Abstract:
Orthogonal time frequency space (OTFS) modulation and massive multi-input multi-output (MIMO) are promising technologies for next generation wireless communication systems for their abilities to counteract the issue of high mobility with large Doppler spread and mitigate the channel path attenuation, respectively. The natural integration of massive MIMO with OTFS in millimeter-wave systems can imp…
▽ More
Orthogonal time frequency space (OTFS) modulation and massive multi-input multi-output (MIMO) are promising technologies for next generation wireless communication systems for their abilities to counteract the issue of high mobility with large Doppler spread and mitigate the channel path attenuation, respectively. The natural integration of massive MIMO with OTFS in millimeter-wave systems can improve communication data rate and enhance the spectral efficiency. However, when transmitting wideband signals with large-scale arrays, the beam squint effect may occur, causing discrepancies in beam directions across subcarriers in multi-carrier systems. Moreover, the high-mobility wideband millimeter wave communications
can induce the Doppler squint effect, leading to different Doppler shifts among the subcarriers. Both beam squint effect and Doppler squint effect (denoted as doubly squint effect) can degrade communication performance significantly. In this paper, we present an efficient channel estimation and hybrid precoding scheme to address the doubly squint effect in massive MIMO-OTFS systems. We first characterize the wideband channel model and the input-output relationship for massive MIMO-OTFS transmission considering doubly squint effect. We then mathematically derive the impact of channel parameters on chirp pilots under the doubly squint effect. Additionally, we develop a peak-index-based channel estimation scheme. By leveraging the results from channel estimation, we propose a hybrid precoding method to mitigate the doubly squint effect in downlink transmission scenarios. Finally, simulation results validate the effectiveness of our proposed scheme and show its superiority over the existing schemes.
△ Less
Submitted 11 April, 2025;
originally announced April 2025.
-
Omnidirectional Multi-Object Tracking
Authors:
Kai Luo,
Hao Shi,
Sheng Wu,
Fei Teng,
Mengfei Duan,
Chang Huang,
Yuhang Wang,
Kaiwei Wang,
Kailun Yang
Abstract:
Panoramic imagery, with its 360° field of view, offers comprehensive information to support Multi-Object Tracking (MOT) in capturing spatial and temporal relationships of surrounding objects. However, most MOT algorithms are tailored for pinhole images with limited views, impairing their effectiveness in panoramic settings. Additionally, panoramic image distortions, such as resolution loss, geomet…
▽ More
Panoramic imagery, with its 360° field of view, offers comprehensive information to support Multi-Object Tracking (MOT) in capturing spatial and temporal relationships of surrounding objects. However, most MOT algorithms are tailored for pinhole images with limited views, impairing their effectiveness in panoramic settings. Additionally, panoramic image distortions, such as resolution loss, geometric deformation, and uneven lighting, hinder direct adaptation of existing MOT methods, leading to significant performance degradation. To address these challenges, we propose OmniTrack, an omnidirectional MOT framework that incorporates Tracklet Management to introduce temporal cues, FlexiTrack Instances for object localization and association, and the CircularStatE Module to alleviate image and geometric distortions. This integration enables tracking in panoramic field-of-view scenarios, even under rapid sensor motion. To mitigate the lack of panoramic MOT datasets, we introduce the QuadTrack dataset--a comprehensive panoramic dataset collected by a quadruped robot, featuring diverse challenges such as panoramic fields of view, intense motion, and complex environments. Extensive experiments on the public JRDB dataset and the newly introduced QuadTrack benchmark demonstrate the state-of-the-art performance of the proposed framework. OmniTrack achieves a HOTA score of 26.92% on JRDB, representing an improvement of 3.43%, and further achieves 23.45% on QuadTrack, surpassing the baseline by 6.81%. The established dataset and source code are available at https://github.com/xifen523/OmniTrack.
△ Less
Submitted 23 March, 2025; v1 submitted 6 March, 2025;
originally announced March 2025.
-
One-Shot Affordance Grounding of Deformable Objects in Egocentric Organizing Scenes
Authors:
Wanjun Jia,
Fan Yang,
Mengfei Duan,
Xianchi Chen,
Yinxi Wang,
Yiming Jiang,
Wenrui Chen,
Kailun Yang,
Zhiyong Li
Abstract:
Deformable object manipulation in robotics presents significant challenges due to uncertainties in component properties, diverse configurations, visual interference, and ambiguous prompts. These factors complicate both perception and control tasks. To address these challenges, we propose a novel method for One-Shot Affordance Grounding of Deformable Objects (OS-AGDO) in egocentric organizing scene…
▽ More
Deformable object manipulation in robotics presents significant challenges due to uncertainties in component properties, diverse configurations, visual interference, and ambiguous prompts. These factors complicate both perception and control tasks. To address these challenges, we propose a novel method for One-Shot Affordance Grounding of Deformable Objects (OS-AGDO) in egocentric organizing scenes, enabling robots to recognize previously unseen deformable objects with varying colors and shapes using minimal samples. Specifically, we first introduce the Deformable Object Semantic Enhancement Module (DefoSEM), which enhances hierarchical understanding of the internal structure and improves the ability to accurately identify local features, even under conditions of weak component information. Next, we propose the ORB-Enhanced Keypoint Fusion Module (OEKFM), which optimizes feature extraction of key components by leveraging geometric constraints and improves adaptability to diversity and visual interference. Additionally, we propose an instance-conditional prompt based on image data and task context, effectively mitigates the issue of region ambiguity caused by prompt words. To validate these methods, we construct a diverse real-world dataset, AGDDO15, which includes 15 common types of deformable objects and their associated organizational actions. Experimental results demonstrate that our approach significantly outperforms state-of-the-art methods, achieving improvements of 6.2%, 3.2%, and 2.9% in KLD, SIM, and NSS metrics, respectively, while exhibiting high generalization performance. Source code and benchmark dataset will be publicly available at https://github.com/Dikay1/OS-AGDO.
△ Less
Submitted 2 March, 2025;
originally announced March 2025.
-
Continuous K-space Recovery Network with Image Guidance for Fast MRI Reconstruction
Authors:
Yucong Meng,
Zhiwei Yang,
Minghong Duan,
Yonghong Shi,
Zhijian Song
Abstract:
Magnetic resonance imaging (MRI) is a crucial tool for clinical diagnosis while facing the challenge of long scanning time. To reduce the acquisition time, fast MRI reconstruction aims to restore high-quality images from the undersampled k-space. Existing methods typically train deep learning models to map the undersampled data to artifact-free MRI images. However, these studies often overlook the…
▽ More
Magnetic resonance imaging (MRI) is a crucial tool for clinical diagnosis while facing the challenge of long scanning time. To reduce the acquisition time, fast MRI reconstruction aims to restore high-quality images from the undersampled k-space. Existing methods typically train deep learning models to map the undersampled data to artifact-free MRI images. However, these studies often overlook the unique properties of k-space and directly apply general networks designed for image processing to k-space recovery, leaving the precise learning of k-space largely underexplored. In this work, we propose a continuous k-space recovery network from a new perspective of implicit neural representation with image domain guidance, which boosts the performance of MRI reconstruction. Specifically, (1) an implicit neural representation based encoder-decoder structure is customized to continuously query unsampled k-values. (2) an image guidance module is designed to mine the semantic information from the low-quality MRI images to further guide the k-space recovery. (3) a multi-stage training strategy is proposed to recover dense k-space progressively. Extensive experiments conducted on CC359, fastMRI, and IXI datasets demonstrate the effectiveness of our method and its superiority over other competitors.
△ Less
Submitted 13 March, 2025; v1 submitted 17 November, 2024;
originally announced November 2024.
-
An efficient dual-branch framework via implicit self-texture enhancement for arbitrary-scale histopathology image super-resolution
Authors:
Minghong Duan,
Linhao Qu,
Zhiwei Yang,
Manning Wang,
Chenxi Zhang,
Zhijian Song
Abstract:
High-quality whole-slide scanning is expensive, complex, and time-consuming, thus limiting the acquisition and utilization of high-resolution histopathology images in daily clinical work. Deep learning-based single-image super-resolution (SISR) techniques provide an effective way to solve this problem. However, the existing SISR models applied in histopathology images can only work in fixed intege…
▽ More
High-quality whole-slide scanning is expensive, complex, and time-consuming, thus limiting the acquisition and utilization of high-resolution histopathology images in daily clinical work. Deep learning-based single-image super-resolution (SISR) techniques provide an effective way to solve this problem. However, the existing SISR models applied in histopathology images can only work in fixed integer scaling factors, decreasing their applicability. Though methods based on implicit neural representation (INR) have shown promising results in arbitrary-scale super-resolution (SR) of natural images, applying them directly to histopathology images is inadequate because they have unique fine-grained image textures different from natural images. Thus, we propose an Implicit Self-Texture Enhancement-based dual-branch framework (ISTE) for arbitrary-scale SR of histopathology images to address this challenge. The proposed ISTE contains a feature aggregation branch and a texture learning branch. We employ the feature aggregation branch to enhance the learning of the local details for SR images while utilizing the texture learning branch to enhance the learning of high-frequency texture details. Then, we design a two-stage texture enhancement strategy to fuse the features from the two branches to obtain the SR images. Experiments on publicly available datasets, including TMA, HistoSR, and the TCGA lung cancer datasets, demonstrate that ISTE outperforms existing fixed-scale and arbitrary-scale SR algorithms across various scaling factors. Additionally, extensive experiments have shown that the histopathology images reconstructed by the proposed ISTE are applicable to downstream pathology image analysis tasks.
△ Less
Submitted 7 November, 2024; v1 submitted 28 January, 2024;
originally announced January 2024.
-
Towards Arbitrary-scale Histopathology Image Super-resolution: An Efficient Dual-branch Framework based on Implicit Self-texture Enhancement
Authors:
Linhao Qu,
Minghong Duan,
Zhiwei Yang,
Manning Wang,
Zhijian Song
Abstract:
Existing super-resolution models for pathology images can only work in fixed integer magnifications and have limited performance. Though implicit neural network-based methods have shown promising results in arbitrary-scale super-resolution of natural images, it is not effective to directly apply them in pathology images, because pathology images have special fine-grained image textures different f…
▽ More
Existing super-resolution models for pathology images can only work in fixed integer magnifications and have limited performance. Though implicit neural network-based methods have shown promising results in arbitrary-scale super-resolution of natural images, it is not effective to directly apply them in pathology images, because pathology images have special fine-grained image textures different from natural images. To address this challenge, we propose a dual-branch framework with an efficient self-texture enhancement mechanism for arbitrary-scale super-resolution of pathology images. Extensive experiments on two public datasets show that our method outperforms both existing fixed-scale and arbitrary-scale algorithms. To the best of our knowledge, this is the first work to achieve arbitrary-scale super-resolution in the field of pathology images. Codes will be available.
△ Less
Submitted 9 April, 2023;
originally announced April 2023.
-
A physics-guided data-driven feedforward tracking controller for systems with unmodeled dynamics -- applied to 3D printing
Authors:
Cheng-Hao Chou,
Molong Duan,
Chinedum E. Okwudire
Abstract:
A hybrid (i.e., physics-guided data-driven) feedforward tracking controller is proposed for systems with unmodeled linear or nonlinear dynamics. The controller is based on the filtered basis function (FBF) approach, hence it is called a hybrid FBF controller. It formulates the feedforward control input to a system as a linear combination of a set of basis functions whose coefficients are selected…
▽ More
A hybrid (i.e., physics-guided data-driven) feedforward tracking controller is proposed for systems with unmodeled linear or nonlinear dynamics. The controller is based on the filtered basis function (FBF) approach, hence it is called a hybrid FBF controller. It formulates the feedforward control input to a system as a linear combination of a set of basis functions whose coefficients are selected to minimize tracking errors. The basis functions are filtered using a combination of two linear models to predict and minimize the tracking errors. The first model is physics-based and remains unaltered during the execution of the controller, while the second is data-driven and is continuously updated during the execution of the controller. To ensure its practicality and safe learning, the proposed hybrid FBF controller is equipped with the ability to handle delays in data acquisition and to detect impending instability due to its inherent data-driven feedback loop. Its effectiveness is demonstrated via application to vibration compensation of a 3D printer with unmodeled linear and nonlinear dynamics. Thanks to the proposed hybrid FBF controller, the tracking accuracy of the 3D printer is significantly improved in experiments involving high-speed printing, compared to a standard FBF controller that does not incorporate a data-driven model. Furthermore, the ability of the hybrid FBF controller to detect and, hence, potentially avoid impending instability is demonstrated offline using data collected online from experiments.
△ Less
Submitted 23 June, 2022;
originally announced June 2022.