Skip to main content

Showing 1–50 of 120 results for author: Zhu, W

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.08178  [pdf, ps, other

    eess.IV cs.CV

    Cracking Instance Jigsaw Puzzles: An Alternative to Multiple Instance Learning for Whole Slide Image Analysis

    Authors: Xiwen Chen, Peijie Qiu, Wenhui Zhu, Hao Wang, Huayu Li, Xuanzhao Dong, Xiaotong Sun, Xiaobing Yu, Yalin Wang, Abolfazl Razi, Aristeidis Sotiras

    Abstract: While multiple instance learning (MIL) has shown to be a promising approach for histopathological whole slide image (WSI) analysis, its reliance on permutation invariance significantly limits its capacity to effectively uncover semantic correlations between instances within WSIs. Based on our empirical and theoretical investigations, we argue that approaches that are not permutation-invariant but… ▽ More

    Submitted 10 July, 2025; originally announced July 2025.

    Comments: Accepted by ICCV2025

  2. arXiv:2507.03729  [pdf, ps, other

    eess.SP

    Improving SAGIN Resilience to Jamming with Reconfigurable Intelligent Surfaces

    Authors: Leila Marandi, Khaled Humadi, Gunes Karabulut Kurt, Wessam Ajib, Wei-Ping Zhu

    Abstract: This study investigates the anti-jamming space-air-ground integrated network (SAGIN) scenario wherein a reconfigurable intelligent surface (RIS) is deployed on a fixed Unmanned Aerial Vehicle (UAV) to counteract malevolent jamming attacks. In contrast to existing research, in this paper, we consider that a Low Earth Orbit (LEO) satellite is sending the signal to the user on the ground in the prese… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

    Comments: Accepted at IEEE VTC-Fall 2025. 7 pages, 4 figures

  3. arXiv:2506.23203  [pdf, ps, other

    eess.SP cs.AI

    Multi-Branch DNN and CRLB-Ratio-Weight Fusion for Enhanced DOA Sensing via a Massive H$^2$AD MIMO Receiver

    Authors: Feng Shu, Jiatong Bai, Di Wu, Wei Zhu, Bin Deng, Fuhui Zhou, Jiangzhou Wang

    Abstract: As a green MIMO structure, massive H$^2$AD is viewed as a potential technology for the future 6G wireless network. For such a structure, it is a challenging task to design a low-complexity and high-performance fusion of target direction values sensed by different sub-array groups with fewer use of prior knowledge. To address this issue, a lightweight Cramer-Rao lower bound (CRLB)-ratio-weight fusi… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  4. arXiv:2506.21349  [pdf, ps, other

    cs.CV eess.IV

    Generalizable Neural Electromagnetic Inverse Scattering

    Authors: Yizhe Cheng, Chunxun Tian, Haoru Wang, Wentao Zhu, Xiaoxuan Ma, Yizhou Wang

    Abstract: Solving Electromagnetic Inverse Scattering Problems (EISP) is fundamental in applications such as medical imaging, where the goal is to reconstruct the relative permittivity from scattered electromagnetic field. This inverse process is inherently ill-posed and highly nonlinear, making it particularly challenging. A recent machine learning-based approach, Img-Interiors, shows promising results by l… ▽ More

    Submitted 1 July, 2025; v1 submitted 26 June, 2025; originally announced June 2025.

  5. arXiv:2506.18671  [pdf, ps, other

    cs.SD cs.CV cs.GR eess.AS

    TCDiff++: An End-to-end Trajectory-Controllable Diffusion Model for Harmonious Music-Driven Group Choreography

    Authors: Yuqin Dai, Wanlu Zhu, Ronghui Li, Xiu Li, Zhenyu Zhang, Jun Li, Jian Yang

    Abstract: Music-driven dance generation has garnered significant attention due to its wide range of industrial applications, particularly in the creation of group choreography. During the group dance generation process, however, most existing methods still face three primary issues: multi-dancer collisions, single-dancer foot sliding and abrupt swapping in the generation of long group dance. In this paper,… ▽ More

    Submitted 26 June, 2025; v1 submitted 23 June, 2025; originally announced June 2025.

  6. arXiv:2506.12908  [pdf, ps, other

    eess.SP

    Low-Latency Terrestrial Interference Detection for Satellite-to-Device Communications

    Authors: Runnan Liu, Weifeng Zhu, Shu Sun, Wenjun Zhang

    Abstract: Direct satellite-to-device communication is a promising future direction due to its lower latency and enhanced efficiency. However, intermittent and unpredictable terrestrial interference significantly affects system reliability and performance. Continuously employing sophisticated interference mitigation techniques is practically inefficient. Motivated by the periodic idle intervals characteristi… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

    Comments: 6 pages

  7. arXiv:2504.14783  [pdf, other

    cs.CV cs.AI eess.IV stat.ML

    How Effective Can Dropout Be in Multiple Instance Learning ?

    Authors: Wenhui Zhu, Peijie Qiu, Xiwen Chen, Zhangsihao Yang, Aristeidis Sotiras, Abolfazl Razi, Yalin Wang

    Abstract: Multiple Instance Learning (MIL) is a popular weakly-supervised method for various applications, with a particular interest in histological whole slide image (WSI) classification. Due to the gigapixel resolution of WSI, applications of MIL in WSI typically necessitate a two-stage training scheme: first, extract features from the pre-trained backbone and then perform MIL aggregation. However, it is… ▽ More

    Submitted 20 May, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

    Comments: Accepted by ICML2025

  8. arXiv:2504.11162  [pdf, ps, other

    eess.SP cs.IT

    Scalable Transceiver Design for Multi-User Communication in FDD Massive MIMO Systems via Deep Learning

    Authors: Lin Zhu, Weifeng Zhu, Shuowen Zhang, Shuguang Cui, Liang Liu

    Abstract: This paper addresses the joint transceiver design, including pilot transmission, channel feature extraction and feedback, as well as precoding, for low-overhead downlink massive multiple-input multiple-output (MIMO) communication in frequency-division duplex (FDD) systems. Although deep learning (DL) has shown great potential in tackling this problem, existing methods often suffer from poor scalab… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  9. arXiv:2504.10181  [pdf

    eess.SY

    A New Paradigm in IBR Modeling for Power Flow and Short Circuit Analysis

    Authors: Zahid Javid, Firdous Ul Nazir, Wentao Zhu, Diptargha Chakravorty, Ahmed Aboushady, Mohamed Galeela

    Abstract: The fault characteristics of inverter-based resources (IBRs) are different from conventional synchronous generators. The fault response of IBRs is non-linear due to saturation states and mainly determined by fault ride through (FRT) strategies of the associated voltage source converter (VSC). This results in prohibitively large solution times for power flows considering these short circuit charact… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: 12 Pages, First Revision Submitted

  10. arXiv:2502.14260  [pdf, other

    eess.IV cs.AI cs.CV

    EyeBench: A Call for More Rigorous Evaluation of Retinal Image Enhancement

    Authors: Wenhui Zhu, Xuanzhao Dong, Xin Li, Yujian Xiong, Xiwen Chen, Peijie Qiu, Vamsi Krishna Vasa, Zhangsihao Yang, Yi Su, Oana Dumitrascu, Yalin Wang

    Abstract: Over the past decade, generative models have achieved significant success in enhancement fundus images.However, the evaluation of these models still presents a considerable challenge. A comprehensive evaluation benchmark for fundus image enhancement is indispensable for three main reasons: 1) The existing denoising metrics (e.g., PSNR, SSIM) are hardly to extend to downstream real-world clinical r… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

  11. arXiv:2502.02295  [pdf, ps, other

    eess.SP cs.IT

    Intelligent Reflecting Surface Based Localization of Mixed Near-Field and Far-Field Targets

    Authors: Weifeng Zhu, Qipeng Wang, Shuowen Zhang, Boya Di, Liang Liu, Yonina C. Eldar

    Abstract: This paper considers an intelligent reflecting surface (IRS)-assisted bi-static localization architecture for the sixth-generation (6G) integrated sensing and communication (ISAC) network. The system consists of a transmit user, a receive base station (BS), an IRS, and multiple targets in either the far-field or near-field region of the IRS. In particular, we focus on the challenging scenario wher… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

  12. arXiv:2501.08809  [pdf, other

    cs.SD cs.AI eess.AS

    XMusic: Towards a Generalized and Controllable Symbolic Music Generation Framework

    Authors: Sida Tian, Can Zhang, Wei Yuan, Wei Tan, Wenjie Zhu

    Abstract: In recent years, remarkable advancements in artificial intelligence-generated content (AIGC) have been achieved in the fields of image synthesis and text generation, generating content comparable to that produced by humans. However, the quality of AI-generated music has not yet reached this standard, primarily due to the challenge of effectively controlling musical emotions and ensuring high-quali… ▽ More

    Submitted 15 January, 2025; originally announced January 2025.

    Comments: accepted by TMM

  13. arXiv:2412.19071  [pdf, other

    eess.SP

    Movable Intelligent Surface (MIS) for Wireless Communications: Architecture, Modeling, Algorithm, and Prototyping

    Authors: Ziyuan Zheng, Qingqing Wu, Wen Chen, Xiangming Wu, Weiren Zhu

    Abstract: Reconfigurable intelligent surfaces enhance wireless systems by reshaping propagation environments. However, dynamic metasurfaces (MSs) with numerous phase-shift elements incur undesired control and hardware costs. In contrast, static MSs (SMSs), configured with static phase shifts pre-designed for specific communication demands, offer a cost-effective alternative by eliminating element-wise tunin… ▽ More

    Submitted 26 December, 2024; originally announced December 2024.

    Comments: 13 pages,10 figures, submitted to IEEE Transactions for possible publications

  14. arXiv:2412.07040  [pdf, other

    eess.SP

    Beyond Idle Channels: Unlocking Idle Space with Signal Alignment in Massive MIMO Cognitive Radio Networks

    Authors: Weidong Zhu, Xueqian Li, Longwei Wang, Zheng Zhang

    Abstract: Cognitive radio networks (CRNs) have traditionally focused on utilizing idle channels to enhance spectrum efficiency. However, as wireless networks grow denser, channel-centric strategies face increasing limitations. This paper introduces a paradigm shift by exploring the underutilized potential of idle spatial dimensions, termed idle space, in co-channel transmissions. By integrating massive mult… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

    Comments: 9 pages, 7 figures

  15. arXiv:2411.01403  [pdf, other

    eess.IV cs.CV

    TPOT: Topology Preserving Optimal Transport in Retinal Fundus Image Enhancement

    Authors: Xuanzhao Dong, Wenhui Zhu, Xin Li, Guoxin Sun, Yi Su, Oana M. Dumitrascu, Yalin Wang

    Abstract: Retinal fundus photography enhancement is important for diagnosing and monitoring retinal diseases. However, early approaches to retinal image enhancement, such as those based on Generative Adversarial Networks (GANs), often struggle to preserve the complex topological information of blood vessels, resulting in spurious or missing vessel structures. The persistence diagram, which captures topologi… ▽ More

    Submitted 2 November, 2024; originally announced November 2024.

  16. arXiv:2410.22674  [pdf

    eess.IV cs.LG

    Dynamic PET Image Prediction Using a Network Combining Reversible and Irreversible Modules

    Authors: Jie Sun, Qian Xia, Chuanfu Sun, Yumei Chen, Huafeng Liu, Wentao Zhu, Qiegen Liu

    Abstract: Dynamic positron emission tomography (PET) images can reveal the distribution of tracers in the organism and the dynamic processes involved in biochemical reactions, and it is widely used in clinical practice. Despite the high effectiveness of dynamic PET imaging in studying the kinetics and metabolic processes of radiotracers. Pro-longed scan times can cause discomfort for both patients and medic… ▽ More

    Submitted 29 October, 2024; originally announced October 2024.

  17. arXiv:2410.15036  [pdf, other

    eess.IV cs.CV

    EViT-Unet: U-Net Like Efficient Vision Transformer for Medical Image Segmentation on Mobile and Edge Devices

    Authors: Xin Li, Wenhui Zhu, Xuanzhao Dong, Oana M. Dumitrascu, Yalin Wang

    Abstract: With the rapid development of deep learning, CNN-based U-shaped networks have succeeded in medical image segmentation and are widely applied for various tasks. However, their limitations in capturing global features hinder their performance in complex segmentation tasks. The rise of Vision Transformer (ViT) has effectively compensated for this deficiency of CNNs and promoted the application of ViT… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

    Comments: 5 pages, 3 figures

  18. arXiv:2410.11578  [pdf, other

    eess.IV cs.AI cs.CV

    STA-Unet: Rethink the semantic redundant for Medical Imaging Segmentation

    Authors: Vamsi Krishna Vasa, Wenhui Zhu, Xiwen Chen, Peijie Qiu, Xuanzhao Dong, Yalin Wang

    Abstract: In recent years, significant progress has been made in the medical image analysis domain using convolutional neural networks (CNNs). In particular, deep neural networks based on a U-shaped architecture (UNet) with skip connections have been adopted for several medical imaging tasks, including organ segmentation. Despite their great success, CNNs are not good at learning global or semantic features… ▽ More

    Submitted 13 October, 2024; originally announced October 2024.

  19. arXiv:2409.19595  [pdf, other

    cs.SD cs.LG eess.AS

    Solution for Temporal Sound Localisation Task of ECCV Second Perception Test Challenge 2024

    Authors: Haowei Gu, Weihao Zhu, Yang Yang

    Abstract: This report proposes an improved method for the Temporal Sound Localisation (TSL) task, which localizes and classifies the sound events occurring in the video according to a predefined set of sound classes. The champion solution from last year's first competition has explored the TSL by fusing audio and video modalities with the same weight. Considering the TSL task aims to localize sound events,… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

  20. arXiv:2409.10966  [pdf, other

    eess.IV cs.CV

    CUNSB-RFIE: Context-aware Unpaired Neural Schrödinger Bridge in Retinal Fundus Image Enhancement

    Authors: Xuanzhao Dong, Vamsi Krishna Vasa, Wenhui Zhu, Peijie Qiu, Xiwen Chen, Yi Su, Yujian Xiong, Zhangsihao Yang, Yanxi Chen, Yalin Wang

    Abstract: Retinal fundus photography is significant in diagnosing and monitoring retinal diseases. However, systemic imperfections and operator/patient-related factors can hinder the acquisition of high-quality retinal images. Previous efforts in retinal image enhancement primarily relied on GANs, which are limited by the trade-off between training stability and output diversity. In contrast, the Schrödinge… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

  21. arXiv:2409.07862  [pdf, other

    eess.IV cs.CV

    Context-Aware Optimal Transport Learning for Retinal Fundus Image Enhancement

    Authors: Vamsi Krishna Vasa, Peijie Qiu, Wenhui Zhu, Yujian Xiong, Oana Dumitrascu, Yalin Wang

    Abstract: Retinal fundus photography offers a non-invasive way to diagnose and monitor a variety of retinal diseases, but is prone to inherent quality glitches arising from systemic imperfections or operator/patient-related factors. However, high-quality retinal images are crucial for carrying out accurate diagnoses and automated analyses. The fundus image enhancement is typically formulated as a distributi… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  22. arXiv:2408.15667  [pdf, other

    cs.CV cs.LG cs.SD eess.AS

    Towards reliable respiratory disease diagnosis based on cough sounds and vision transformers

    Authors: Qian Wang, Zhaoyang Bu, Jiaxuan Mao, Wenyu Zhu, Jingya Zhao, Wei Du, Guochao Shi, Min Zhou, Si Chen, Jieming Qu

    Abstract: Recent advancements in deep learning techniques have sparked performance boosts in various real-world applications including disease diagnosis based on multi-modal medical data. Cough sound data-based respiratory disease (e.g., COVID-19 and Chronic Obstructive Pulmonary Disease) diagnosis has also attracted much attention. However, existing works usually utilise traditional machine learning or dee… ▽ More

    Submitted 2 September, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

  23. arXiv:2408.03174  [pdf, ps, other

    eess.SP cs.IT

    Joint Transmission and Compression Optimization for Networked Sensing with Limited-Capacity Fronthaul Links

    Authors: Weifeng Zhu, Shuowen Zhang, Liang Liu

    Abstract: This paper considers networked sensing in cellular network, where multiple base stations (BSs) first compress their received echo signals from multiple targets and then forward the quantized signals to the central unit (CU) via limited-capacity fronthaul links, such that the CU can leverage all useful echo signals to perform high-resolution localization. Under this setup, we manage to characterize… ▽ More

    Submitted 6 September, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

    Comments: submitted to IEEE TWC; conference paper accepted by IEEE Globecom 2024

  24. arXiv:2407.13092  [pdf, other

    eess.IV cs.CV

    CC-DCNet: Dynamic Convolutional Neural Network with Contrastive Constraints for Identifying Lung Cancer Subtypes on Multi-modality Images

    Authors: Yuan Jin, Gege Ma, Geng Chen, Tianling Lyu, Jan Egger, Junhui Lyu, Shaoting Zhang, Wentao Zhu

    Abstract: The accurate diagnosis of pathological subtypes of lung cancer is of paramount importance for follow-up treatments and prognosis managements. Assessment methods utilizing deep learning technologies have introduced novel approaches for clinical diagnosis. However, the majority of existing models rely solely on single-modality image input, leading to limited diagnostic accuracy. To this end, we prop… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  25. arXiv:2407.12271  [pdf, other

    cs.CV eess.IV

    RBAD: A Dataset and Benchmark for Retinal Vessels Branching Angle Detection

    Authors: Hao Wang, Wenhui Zhu, Jiayou Qin, Xin Li, Oana Dumitrascu, Xiwen Chen, Peijie Qiu, Abolfazl Razi

    Abstract: Detecting retinal image analysis, particularly the geometrical features of branching points, plays an essential role in diagnosing eye diseases. However, existing methods used for this purpose often are coarse-level and lack fine-grained analysis for efficient annotation. To mitigate these issues, this paper proposes a novel method for detecting retinal branching angles using a self-configured ima… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  26. arXiv:2407.06612  [pdf

    eess.IV cs.CV cs.LG

    AI-based Automatic Segmentation of Prostate on Multi-modality Images: A Review

    Authors: Rui Jin, Derun Li, Dehui Xiang, Lei Zhang, Hailing Zhou, Fei Shi, Weifang Zhu, Jing Cai, Tao Peng, Xinjian Chen

    Abstract: Prostate cancer represents a major threat to health. Early detection is vital in reducing the mortality rate among prostate cancer patients. One approach involves using multi-modality (CT, MRI, US, etc.) computer-aided diagnosis (CAD) systems for the prostate region. However, prostate segmentation is challenging due to imperfections in the images and the prostate's complex tissue structure. The ad… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  27. arXiv:2407.03575  [pdf, other

    eess.IV cs.CV

    DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification

    Authors: Wenhui Zhu, Xiwen Chen, Peijie Qiu, Aristeidis Sotiras, Abolfazl Razi, Yalin Wang

    Abstract: Multiple instance learning (MIL) stands as a powerful approach in weakly supervised learning, regularly employed in histological whole slide image (WSI) classification for detecting tumorous lesions. However, existing mainstream MIL methods focus on modeling correlation between instances while overlooking the inherent diversity among instances. However, few MIL methods have aimed at diversity mode… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  28. arXiv:2406.14896  [pdf, other

    eess.IV cs.CV

    SelfReg-UNet: Self-Regularized UNet for Medical Image Segmentation

    Authors: Wenhui Zhu, Xiwen Chen, Peijie Qiu, Mohammad Farazi, Aristeidis Sotiras, Abolfazl Razi, Yalin Wang

    Abstract: Since its introduction, UNet has been leading a variety of medical image segmentation tasks. Although numerous follow-up studies have also been dedicated to improving the performance of standard UNet, few have conducted in-depth analyses of the underlying interest pattern of UNet in medical image segmentation. In this paper, we explore the patterns learned in a UNet and observe two important facto… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted as a conference paper to 2024 MICCAI

  29. arXiv:2406.10856  [pdf, other

    cs.NI eess.SY

    LEO Satellite Networks Assisted Geo-distributed Data Processing

    Authors: Zhiyuan Zhao, Zhe Chen, Zheng Lin, Wenjun Zhu, Kun Qiu, Chaoqun You, Yue Gao

    Abstract: Nowadays, the increasing deployment of edge clouds globally provides users with low-latency services. However, connecting an edge cloud to a core cloud via optic cables in terrestrial networks poses significant barriers due to the prohibitively expensive building cost of optic cables. Fortunately, emerging Low Earth Orbit (LEO) satellite networks (e.g., Starlink) offer a more cost-effective soluti… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 6 pages, 5 figures

  30. arXiv:2405.19665  [pdf

    eess.SY cs.AI cs.LG

    A novel fault localization with data refinement for hydroelectric units

    Authors: Jialong Huang, Junlin Song, Penglong Lian, Mengjie Gan, Zhiheng Su, Benhao Wang, Wenji Zhu, Xiaomin Pu, Jianxiao Zou, Shicai Fan

    Abstract: Due to the scarcity of fault samples and the complexity of non-linear and non-smooth characteristics data in hydroelectric units, most of the traditional hydroelectric unit fault localization methods are difficult to carry out accurate localization. To address these problems, a sparse autoencoder (SAE)-generative adversarial network (GAN)-wavelet noise reduction (WNR)- manifold-boosted deep learni… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 6pages,4 figures,Conference on Decision and Control(CDC) conference

  31. arXiv:2403.12425  [pdf, other

    cs.CV cs.SD eess.AS

    Multimodal Fusion Method with Spatiotemporal Sequences and Relationship Learning for Valence-Arousal Estimation

    Authors: Jun Yu, Gongpeng Zhao, Yongqi Wang, Zhihong Wei, Yang Zheng, Zerui Zhang, Zhongpeng Cai, Guochen Xie, Jichao Zhu, Wangyuan Zhu

    Abstract: This paper presents our approach for the VA (Valence-Arousal) estimation task in the ABAW6 competition. We devised a comprehensive model by preprocessing video frames and audio segments to extract visual and audio features. Through the utilization of Temporal Convolutional Network (TCN) modules, we effectively captured the temporal and spatial correlations between these features. Subsequently, we… ▽ More

    Submitted 20 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: 8 pages,3 figures

  32. arXiv:2403.11757  [pdf, other

    cs.MM cs.LG cs.SD eess.AS

    Efficient Feature Extraction and Late Fusion Strategy for Audiovisual Emotional Mimicry Intensity Estimation

    Authors: Jun Yu, Wangyuan Zhu, Jichao Zhu

    Abstract: In this paper, we present the solution to the Emotional Mimicry Intensity (EMI) Estimation challenge, which is part of 6th Affective Behavior Analysis in-the-wild (ABAW) Competition.The EMI Estimation challenge task aims to evaluate the emotional intensity of seed videos by assessing them from a set of predefined emotion categories (i.e., "Admiration", "Amusement", "Determination", "Empathic Pain"… ▽ More

    Submitted 19 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  33. arXiv:2402.11834  [pdf, ps, other

    eess.SY eess.SP

    Terahertz User-Centric Clustering in the Presence of Beam Misalignment

    Authors: Khaled Humadi, Imene Trigui, Wei-Ping Zhu, Wessam Ajib

    Abstract: Beam misalignment is one of the main challenges for the design of reliable wireless systems in terahertz (THz) bands. This paper investigates how to apply user-centric base station (BS) clustering as a valuable add-on in THz networks. In particular, to reduce the impact of beam misalignment, a user-centric BS clustering design that provides multi-connectivity via BS cooperation is investigated. Th… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  34. arXiv:2402.10388  [pdf

    cs.CY eess.SP

    Improvising Age Verification Technologies in Canada: Technical, Regulatory and Social Dynamics

    Authors: Azfar Adib, Wei-Ping Zhu, M. Omair Ahmad

    Abstract: Age verification, which is a mandatory legal requirement for delivering certain age-appropriate services or products, has recently been emphasized around the globe to ensure online safety for children. The rapid advancement of artificial intelligence has facilitated the recent development of some cutting-edge age-verification technologies, particularly using biometrics. However, successful deploym… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: Presented and accepted for publication in the 2023 IEEE International Humanitarian Technologies Conference (IEEE IHTC 2023), November 1 to 3, 2023, Cartagena, Colombia

  35. arXiv:2401.04154  [pdf

    cs.CV cs.AI cs.LG cs.MM cs.SD eess.AS

    Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification

    Authors: Wentao Zhu

    Abstract: Audio and video are two most common modalities in the mainstream media platforms, e.g., YouTube. To learn from multimodal videos effectively, in this work, we propose a novel audio-video recognition approach termed audio video Transformer, AVT, leveraging the effective spatio-temporal representation by the video Transformer to improve action recognition accuracy. For multimodal fusion, simply conc… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: Accepted by WACV 2024; well-formatted PDF is in https://drive.google.com/file/d/1qvW52lamsvNGMCqPS7q8g8L4NaR_LlbR/view?usp=sharing. arXiv admin note: text overlap with arXiv:2401.04023

  36. arXiv:2401.04023  [pdf

    cs.CV cs.AI cs.LG cs.MM cs.SD eess.AS

    Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video Classification

    Authors: Wentao Zhu

    Abstract: In recent years, researchers combine both audio and video signals to deal with challenges where actions are not well represented or captured by visual cues. However, how to effectively leverage the two modalities is still under development. In this work, we develop a multiscale multimodal Transformer (MMT) that leverages hierarchical representation learning. Particularly, MMT is composed of a nove… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: Accepted by WACV 2024; well-formatted PDF is in https://drive.google.com/file/d/10Zo_ydJZFAm7YsxHDgTjhyc4dEJbW_dk/view?usp=sharing

  37. arXiv:2312.16228  [pdf, other

    cs.SD cs.LG cs.MM cs.NE eess.AS

    Deformable Audio Transformer for Audio Event Detection

    Authors: Wentao Zhu

    Abstract: Transformers have achieved promising results on a variety of tasks. However, the quadratic complexity in self-attention computation has limited the applications, especially in low-resource settings and mobile or edge devices. Existing works have proposed to exploit hand-crafted attention patterns to reduce computation complexity. However, such hand-crafted patterns are data-agnostic and may not be… ▽ More

    Submitted 7 January, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

    Comments: ICASSP 2024. arXiv admin note: substantial text overlap with arXiv:2201.00520 by other authors

  38. arXiv:2312.05786  [pdf, other

    eess.SP cs.IT

    Deep Learning for Joint Design of Pilot, Channel Feedback, and Hybrid Beamforming in FDD Massive MIMO-OFDM Systems

    Authors: Junyi Yang, Weifeng Zhu, Shu Sun, Xiaofeng Li, Xingqin Lin, Meixia Tao

    Abstract: This letter considers the transceiver design in frequency division duplex (FDD) massive multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) systems for high-quality data transmission. We propose a novel deep learning based framework where the procedures of pilot design, channel feedback, and hybrid beamforming are realized by carefully crafted deep neural networ… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: 5 pages, 4 figures, acccpted by IEEE Communication Letters

  39. arXiv:2312.05557  [pdf, ps, other

    cs.IT eess.SP

    Long-Term Rate-Fairness-Aware Beamforming Based Massive MIMO Systems

    Authors: W. Zhu, H. D. Tuan, E. Dutkiewicz, Y. Fang, H. V. Poor, L. Hanzo

    Abstract: This is the first treatise on multi-user (MU) beamforming designed for achieving long-term rate-fairness in fulldimensional MU massive multi-input multi-output (m-MIMO) systems. Explicitly, based on the channel covariances, which can be assumed to be known beforehand, we address this problem by optimizing the following objective functions: the users' signal-toleakage-noise ratios (SLNRs) using SLN… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

  40. arXiv:2311.14264  [pdf, ps, other

    eess.SP

    An ADMM-Based Geometric Configuration Optimization in RSSD-Based Source Localization By UAVs with Spread Angle Constraint

    Authors: Xin Cheng, Guangjie Han, Jinlin Peng, Jinfang Jiang, Yu He, Weiqiang Zhu, Feng Shu, Jiangzhou Wang

    Abstract: Deploying multiple unmanned aerial vehicles (UAVs) to locate a signal-emitting source covers a wide range of military and civilian applications like rescue and target tracking. It is well known that the UAVs-source (sensors-target) geometry, namely geometric configuration, significantly affects the final localization accuracy. This paper focuses on the geometric configuration optimization for rece… ▽ More

    Submitted 17 July, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

  41. arXiv:2310.17155  [pdf, ps, other

    cs.IT eess.SP

    Max-min Rate Optimization of Low-Complexity Hybrid Multi-User Beamforming Maintaining Rate-Fairness

    Authors: W. Zhu, H. D. Tuan, E. Dutkiewicz, H. V. Poor, L. Hanzo

    Abstract: A wireless network serving multiple users in the millimeter-wave or the sub-terahertz band by a base station is considered. High-throughput multi-user hybrid-transmit beamforming is conceived by maximizing the minimum rate of the users. For the sake of energy-efficient signal transmission, the array-of-subarrays structure is used for analog beamforming relying on low-resolution phase shifters. We… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  42. arXiv:2310.10095  [pdf, other

    eess.IV cs.CV cs.LG

    A Multi-Scale Spatial Transformer U-Net for Simultaneously Automatic Reorientation and Segmentation of 3D Nuclear Cardiac Images

    Authors: Yangfan Ni, Duo Zhang, Gege Ma, Lijun Lu, Zhongke Huang, Wentao Zhu

    Abstract: Accurate reorientation and segmentation of the left ventricular (LV) is essential for the quantitative analysis of myocardial perfusion imaging (MPI), in which one critical step is to reorient the reconstructed transaxial nuclear cardiac images into standard short-axis slices for subsequent image processing. Small-scale LV myocardium (LV-MY) region detection and the diverse cardiac structures of i… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: 17 pages, 7 figures

  43. arXiv:2308.12198  [pdf, other

    eess.SP cs.IT

    Hierarchical Beam Alignment for Millimeter-Wave Communication Systems: A Deep Learning Approach

    Authors: Junyi Yang, Weifeng Zhu, Meixia Tao, Shu Sun

    Abstract: Fast and precise beam alignment is crucial for high-quality data transmission in millimeter-wave (mmWave) communication systems, where large-scale antenna arrays are utilized to overcome the severe propagation loss. To tackle the challenging problem, we propose a novel deep learning-based hierarchical beam alignment method for both multiple-input single-output (MISO) and multiple-input multiple-ou… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: 15 pages, 16 figures, to appear in Transactions on Wireless Communications. arXiv admin note: text overlap with arXiv:2209.03643

  44. arXiv:2308.04663  [pdf, other

    eess.IV cs.CV cs.LG

    Classification of lung cancer subtypes on CT images with synthetic pathological priors

    Authors: Wentao Zhu, Yuan Jin, Gege Ma, Geng Chen, Jan Egger, Shaoting Zhang, Dimitris N. Metaxas

    Abstract: The accurate diagnosis on pathological subtypes for lung cancer is of significant importance for the follow-up treatments and prognosis managements. In this paper, we propose self-generating hybrid feature network (SGHF-Net) for accurately classifying lung cancer subtypes on computed tomography (CT) images. Inspired by studies stating that cross-scale associations exist in the image patterns betwe… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: 16 pages, 7 figures

    Journal ref: Medical Image Analysis 95, July 2024, 103199

  45. arXiv:2306.15942  [pdf, other

    cs.SD cs.AI eess.AS

    Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction

    Authors: Aoqi Guo, Junnan Wu, Peng Gao, Wenbo Zhu, Qinwen Guo, Dazhi Gao, Yujun Wang

    Abstract: Recently, deep learning-based beamforming algorithms have shown promising performance in target speech extraction tasks. However, most systems do not fully utilize spatial information. In this paper, we propose a target speech extraction network that utilizes spatial information to enhance the performance of neural beamformer. To achieve this, we first use the UNet-TCN structure to model input fea… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

  46. arXiv:2306.11958  [pdf, other

    physics.med-ph eess.IV

    PDS-MAR: a fine-grained Projection-Domain Segmentation-based Metal Artifact Reduction method for intraoperative CBCT images with guidewires

    Authors: Tianling Lyu, Zhan Wu, Gege Ma, Chen Jiang, Xinyun Zhong, Yan Xi, Yang Chen, Wentao Zhu

    Abstract: Since the invention of modern CT systems, metal artifacts have been a persistent problem. Due to increased scattering, amplified noise, and insufficient data collection, it is more difficult to suppress metal artifacts in cone-beam CT, limiting its use in human- and robot-assisted spine surgeries where metallic guidewires and screws are commonly used. In this paper, we demonstrate that conventiona… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: 19 Pages

    Journal ref: Phys. Med. Biol. 68 215007 (2023)

  47. arXiv:2306.01289  [pdf, other

    eess.IV cs.CV

    nnMobileNet: Rethinking CNN for Retinopathy Research

    Authors: Wenhui Zhu, Peijie Qiu, Xiwen Chen, Xin Li, Natasha Lepore, Oana M. Dumitrascu, Yalin Wang

    Abstract: Over the past few decades, convolutional neural networks (CNNs) have been at the forefront of the detection and tracking of various retinal diseases (RD). Despite their success, the emergence of vision transformers (ViT) in the 2020s has shifted the trajectory of RD model development. The leading-edge performance of ViT-based models in RD can be largely credited to their scalability-their ability… ▽ More

    Submitted 15 April, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: Accepted as a conference paper to 2024 CVPRW

  48. arXiv:2305.08014  [pdf

    cs.CV cs.AI cs.LG eess.AS

    Surface EMG-Based Inter-Session/Inter-Subject Gesture Recognition by Leveraging Lightweight All-ConvNet and Transfer Learning

    Authors: Md. Rabiul Islam, Daniel Massicotte, Philippe Y. Massicotte, Wei-Ping Zhu

    Abstract: Gesture recognition using low-resolution instantaneous HD-sEMG images opens up new avenues for the development of more fluid and natural muscle-computer interfaces. However, the data variability between inter-session and inter-subject scenarios presents a great challenge. The existing approaches employed very large and complex deep ConvNet or 2SRNN-based domain adaptation methods to approximate th… ▽ More

    Submitted 19 February, 2024; v1 submitted 13 May, 2023; originally announced May 2023.

  49. arXiv:2304.09727  [pdf, other

    eess.SP cs.IT

    Cooperative Multi-Cell Massive Access with Temporally Correlated Activity

    Authors: Weifeng Zhu, Meixia Tao, Xiaojun Yuan, Fan Xu, Yunfeng Guan

    Abstract: This paper investigates the problem of activity detection and channel estimation in cooperative multi-cell massive access systems with temporally correlated activity, where all access points (APs) are connected to a central unit via fronthaul links. We propose to perform user-centric AP cooperation for computation burden alleviation and introduce a generalized sliding-window detection strategy for… ▽ More

    Submitted 19 April, 2023; originally announced April 2023.

    Comments: 16 pages, 17 figures, minor revision

  50. arXiv:2303.10757  [pdf, other

    cs.SD cs.AI cs.CV cs.LG eess.AS

    Multiscale Audio Spectrogram Transformer for Efficient Audio Classification

    Authors: Wentao Zhu, Mohamed Omar

    Abstract: Audio event has a hierarchical architecture in both time and frequency and can be grouped together to construct more abstract semantic audio classes. In this work, we develop a multiscale audio spectrogram Transformer (MAST) that employs hierarchical representation learning for efficient audio classification. Specifically, MAST employs one-dimensional (and two-dimensional) pooling operators along… ▽ More

    Submitted 19 March, 2023; originally announced March 2023.

    Comments: ICASSP 2023