Skip to main content

Showing 1–50 of 68 results for author: Xie, W

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.04682   

    cs.CV eess.SP

    MARS: Radio Map Super-resolution and Reconstruction Method under Sparse Channel Measurements

    Authors: Chuyun Deng, Na Liu, Wei Xie, Lianming Xu, Li Wang

    Abstract: Radio maps reflect the spatial distribution of signal strength and are essential for applications like smart cities, IoT, and wireless network planning. However, reconstructing accurate radio maps from sparse measurements remains challenging. Traditional interpolation and inpainting methods lack environmental awareness, while many deep learning approaches depend on detailed scene data, limiting ge… ▽ More

    Submitted 8 July, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

    Comments: The authors withdraw this submission to substantially revise the introduction and experimental sections and incorporate new content. The manuscript has not been submitted or published elsewhere. A revised version may be submitted in the future

  2. arXiv:2506.03238  [pdf, ps, other

    eess.IV cs.AI cs.CV

    Rethinking Whole-Body CT Image Interpretation: An Abnormality-Centric Approach

    Authors: Ziheng Zhao, Lisong Dai, Ya Zhang, Yanfeng Wang, Weidi Xie

    Abstract: Automated interpretation of CT images-particularly localizing and describing abnormal findings across multi-plane and whole-body scans-remains a significant challenge in clinical radiology. This work aims to address this challenge through four key contributions: (i) On taxonomy, we collaborate with senior radiologists to propose a comprehensive hierarchical classification system, with 404 represen… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  3. arXiv:2504.05578  [pdf, other

    cs.IT eess.SP

    Recent Advances in Near-Field Beam Training and Channel Estimation for XL-MIMO Systems

    Authors: Ming Zeng, Ji Wang, Xingwang Li, Wanming Hao, Zheng Chu, Wenwu Xie, Xianbin Wang, Quoc-Viet Pham

    Abstract: Extremely large-scale multiple-input multiple-output (XL-MIMO) is a key technology for next-generation wireless communication systems. By deploying significantly more antennas than conventional massive MIMO systems, XL-MIMO promises substantial improvements in spectral efficiency. However, due to the drastically increased array size, the conventional planar wave channel model is no longer accurate… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

    Comments: Submitted to IEEE Wireless Commmunications; 8 pages; 6 figures

  4. arXiv:2503.16543  [pdf, other

    eess.IV cs.CV

    Comprehensive Review of Reinforcement Learning for Medical Ultrasound Imaging

    Authors: Hanae Elmekki, Saidul Islam, Ahmed Alagha, Hani Sami, Amanda Spilkin, Ehsan Zakeri, Antonela Mariel Zanuttini, Jamal Bentahar, Lyes Kadem, Wen-Fang Xie, Philippe Pibarot, Rabeb Mizouni, Hadi Otrok, Shakti Singh, Azzam Mourad

    Abstract: Medical Ultrasound (US) imaging has seen increasing demands over the past years, becoming one of the most preferred imaging modalities in clinical practice due to its affordability, portability, and real-time capabilities. However, it faces several challenges that limit its applicability, such as operator dependency, variability in interpretation, and limited resolution, which are amplified by the… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

    Comments: 89 pages, 23 figures

  5. arXiv:2503.06743  [pdf, ps, other

    eess.IV cs.CV

    GlaGAN: A Generative Unsupervised Model for High-Precision Segmentation of Retinal Main Vessels toward Early Detection of Glaucoma

    Authors: Cheng Huang, Weizheng Xie, Tsengdar J. Lee, Jui-Kai Wang, Karanjit Kooner, Ning Zhang, Jia Zhang

    Abstract: Structural changes in the main retinal blood vessels are critical biomarkers for glaucoma onset and progression. Identifying these vessels is essential for vascular modeling yet highly challenging. This paper introduces GlaGAN, an unsupervised generative AI model for segmenting main blood vessels in Optical Coherence Tomography Angiography (OCTA) images. The process begins with the Space Colonizat… ▽ More

    Submitted 7 July, 2025; v1 submitted 9 March, 2025; originally announced March 2025.

  6. arXiv:2503.04653  [pdf, other

    cs.CV cs.IR eess.IV

    RadIR: A Scalable Framework for Multi-Grained Medical Image Retrieval via Radiology Report Mining

    Authors: Tengfei Zhang, Ziheng Zhao, Chaoyi Wu, Xiao Zhou, Ya Zhang, Yangfeng Wang, Weidi Xie

    Abstract: Developing advanced medical imaging retrieval systems is challenging due to the varying definitions of `similar images' across different medical contexts. This challenge is compounded by the lack of large-scale, high-quality medical imaging retrieval datasets and benchmarks. In this paper, we propose a novel methodology that leverages dense radiology reports to define image-wise similarity orderin… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  7. arXiv:2502.20762  [pdf, other

    eess.IV cs.CV

    Towards Practical Real-Time Neural Video Compression

    Authors: Zhaoyang Jia, Bin Li, Jiahao Li, Wenxuan Xie, Linfeng Qi, Houqiang Li, Yan Lu

    Abstract: We introduce a practical real-time neural video codec (NVC) designed to deliver high compression ratio, low latency and broad versatility. In practice, the coding speed of NVCs depends on 1) computational costs, and 2) non-computational operational costs, such as memory I/O and the number of function calls. While most efficient NVCs prioritize reducing computational cost, we identify operational c… ▽ More

    Submitted 18 March, 2025; v1 submitted 28 February, 2025; originally announced February 2025.

    Comments: CVPR 2025. Visit the project page at https://dcvccodec.github.io and access the code at https://github.com/microsoft/DCVC

  8. arXiv:2502.11946  [pdf, other

    cs.CL cs.AI cs.HC cs.SD eess.AS

    Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

    Authors: Ailin Huang, Boyong Wu, Bruce Wang, Chao Yan, Chen Hu, Chengli Feng, Fei Tian, Feiyu Shen, Jingbei Li, Mingrui Chen, Peng Liu, Ruihang Miao, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Gong, Zixin Zhang, Hongyu Zhou, Jianjian Sun, Brian Li, Chengting Feng, Changyi Wan, Hanpeng Hu , et al. (120 additional authors not shown)

    Abstract: Real-time speech interaction, serving as a fundamental interface for human-machine collaboration, holds immense potential. However, current open-source models face limitations such as high costs in voice data collection, weakness in dynamic control, and limited intelligence. To address these challenges, this paper introduces Step-Audio, the first production-ready open-source solution. Key contribu… ▽ More

    Submitted 18 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  9. arXiv:2412.13126  [pdf, other

    eess.IV cs.CV

    A Knowledge-enhanced Pathology Vision-language Foundation Model for Cancer Diagnosis

    Authors: Xiao Zhou, Luoyi Sun, Dexuan He, Wenbin Guan, Ruifen Wang, Lifeng Wang, Xin Sun, Kun Sun, Ya Zhang, Yanfeng Wang, Weidi Xie

    Abstract: Deep learning has enabled the development of highly robust foundation models for various pathological tasks across diverse diseases and patient cohorts. Among these models, vision-language pre-training, which leverages large-scale paired data to align pathology image and text embedding spaces, and provides a novel zero-shot paradigm for downstream tasks. However, existing models have been primaril… ▽ More

    Submitted 17 December, 2024; originally announced December 2024.

  10. arXiv:2409.07171  [pdf, ps, other

    eess.IV cs.CV

    AC-IND: Sparse CT reconstruction based on attenuation coefficient estimation and implicit neural distribution

    Authors: Wangduo Xie, Richard Schoonhoven, Tristan van Leeuwen, Matthew B. Blaschko

    Abstract: Computed tomography (CT) reconstruction plays a crucial role in industrial nondestructive testing and medical diagnosis. Sparse view CT reconstruction aims to reconstruct high-quality CT images while only using a small number of projections, which helps to improve the detection speed of industrial assembly lines and is also meaningful for reducing radiation in medical scenarios. Sparse CT reconstr… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: 12 pages

  11. arXiv:2409.01695  [pdf, other

    cs.SD cs.AI eess.AS

    USTC-KXDIGIT System Description for ASVspoof5 Challenge

    Authors: Yihao Chen, Haochen Wu, Nan Jiang, Xiang Xia, Qing Gu, Yunqi Hao, Pengfei Cai, Yu Guan, Jialong Wang, Weilin Xie, Lei Fang, Sian Fang, Yan Song, Wu Guo, Lin Liu, Minqiang Xu

    Abstract: This paper describes the USTC-KXDIGIT system submitted to the ASVspoof5 Challenge for Track 1 (speech deepfake detection) and Track 2 (spoofing-robust automatic speaker verification, SASV). Track 1 showcases a diverse range of technical qualities from potential processing algorithms and includes both open and closed conditions. For these conditions, our system consists of a cascade of a frontend f… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: ASVspoof5 workshop paper

  12. arXiv:2408.15555  [pdf, other

    eess.IV cs.CV cs.LG

    GlaLSTM: A Concurrent LSTM Stream Framework for Glaucoma Detection via Biomarker Mining

    Authors: Cheng Huang, Weizheng Xie, Jian Zhou, Tsengdar Lee, Karanjit Kooner, Jia Zhang

    Abstract: Glaucoma is a complex group of eye diseases marked by optic nerve damage, commonly linked to elevated intraocular pressure and biomarkers like retinal nerve fiber layer thickness. Understanding how these biomarkers interact is crucial for unraveling glaucoma's underlying mechanisms. In this paper, we propose GlaLSTM, a novel concurrent LSTM stream framework for glaucoma detection, leveraging laten… ▽ More

    Submitted 27 March, 2025; v1 submitted 28 August, 2024; originally announced August 2024.

  13. arXiv:2408.08496  [pdf, other

    cs.NI eess.SP

    Generative AI for Energy Harvesting Internet of Things Network: Fundamental, Applications, and Opportunities

    Authors: Wenwen Xie, Geng Sun, Jiahui Li, Jiacheng Wang, Hongyang Du, Dusit Niyato, Octavia A. Dobre

    Abstract: Internet of Things (IoT) devices are typically powered by small-sized batteries with limited energy storage capacity, requiring regular replacement or recharging. To reduce costs and maintain connectivity in IoT networks, energy harvesting technologies are regarded as a promising solution. Notably, due to its robust analytical and generative capabilities, generative artificial intelligence (GenAI)… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  14. arXiv:2407.16684  [pdf, other

    eess.IV cs.CV q-bio.NC

    AutoRG-Brain: Grounded Report Generation for Brain MRI

    Authors: Jiayu Lei, Xiaoman Zhang, Chaoyi Wu, Lisong Dai, Ya Zhang, Yanyong Zhang, Yanfeng Wang, Weidi Xie, Yuehua Li

    Abstract: Radiologists are tasked with interpreting a large number of images in a daily base, with the responsibility of generating corresponding reports. This demanding workload elevates the risk of human error, potentially leading to treatment delays, increased healthcare costs, revenue loss, and operational inefficiencies. To address these challenges, we initiate a series of work on grounded Automatic Re… ▽ More

    Submitted 29 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

  15. Reconfigurable Intelligent Surface for Sensing, Communication, and Computation: Perspectives, Challenges, and Opportunities

    Authors: Bin Li, Wancheng Xie, Zesong Fei

    Abstract: Forthcoming 6G networks have two predominant features of wide coverage and sufficient computation capability. To support the promising applications, Integrated Sensing, Communication, and Computation (ISCC) has been considered as a vital enabler by completing the computation of raw data to achieve accurate environmental sensing. To help the ISCC networks better support the comprehensive services o… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 8 pages, 5 figures

    Journal ref: vol. 7, no. 4, pp. 36-42, Jul. 2024

  16. arXiv:2406.00956  [pdf, other

    cs.CV cs.LG eess.IV

    Improving Segment Anything on the Fly: Auxiliary Online Learning and Adaptive Fusion for Medical Image Segmentation

    Authors: Tianyu Huang, Tao Zhou, Weidi Xie, Shuo Wang, Qi Dou, Yizhe Zhang

    Abstract: The current variants of the Segment Anything Model (SAM), which include the original SAM and Medical SAM, still lack the capability to produce sufficiently accurate segmentation for medical images. In medical imaging contexts, it is not uncommon for human experts to rectify segmentations of specific test samples after SAM generates its segmentation predictions. These rectifications typically entai… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Project Link: https://sam-auxol.github.io/AuxOL/

  17. arXiv:2404.10556  [pdf, other

    cs.NI eess.SP

    Generative AI for Advanced UAV Networking

    Authors: Geng Sun, Wenwen Xie, Dusit Niyato, Hongyang Du, Jiawen Kang, Jing Wu, Sumei Sun, Ping Zhang

    Abstract: With the impressive achievements of chatGPT and Sora, generative artificial intelligence (GAI) has received increasing attention. Not limited to the field of content generation, GAI is also widely used to solve the problems in wireless communication scenarios due to its powerful learning and generalization capabilities. Therefore, we discuss key applications of GAI in improving unmanned aerial veh… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  18. arXiv:2401.16423  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    Synchformer: Efficient Synchronization from Sparse Cues

    Authors: Vladimir Iashin, Weidi Xie, Esa Rahtu, Andrew Zisserman

    Abstract: Our objective is audio-visual synchronization with a focus on 'in-the-wild' videos, such as those on YouTube, where synchronization cues can be sparse. Our contributions include a novel audio-visual synchronization model, and training that decouples feature extraction from synchronization modelling through multi-modal segment-level contrastive pre-training. This approach achieves state-of-the-art… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Extended version of the ICASSP 24 paper. Project page: https://www.robots.ox.ac.uk/~vgg/research/synchformer/ Code: https://github.com/v-iashin/Synchformer

  19. Wideband Beamforming for RIS Assisted Near-Field Communications

    Authors: Ji Wang, Jian Xiao, Yixuan Zou, Wenwu Xie, Yuanwei Liu

    Abstract: A near-field wideband beamforming scheme is investigated for reconfigurable intelligent surface (RIS) assisted multiple-input multiple-output (MIMO) systems, in which a deep learning-based end-to-end (E2E) optimization framework is proposed to maximize the system spectral efficiency. To deal with the near-field double beam split effect, the base station is equipped with frequency-dependent hybrid… ▽ More

    Submitted 7 January, 2025; v1 submitted 20 January, 2024; originally announced January 2024.

    Journal ref: IEEE Transactions on Wireless Communications,2024

  20. arXiv:2312.17183  [pdf, other

    eess.IV cs.CV

    One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts

    Authors: Ziheng Zhao, Yao Zhang, Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie

    Abstract: In this study, we aim to build up a model that can Segment Anything in radiology scans, driven by medical terminologies as Text prompts, termed as SAT. Our main contributions are three folds: (i) for dataset construction, we construct the first multi-modal knowledge tree on human anatomy, including 6502 anatomical terminologies; Then, we build up the largest and most comprehensive segmentation dat… ▽ More

    Submitted 5 February, 2025; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: 69 pages

  21. arXiv:2311.18788  [pdf, other

    eess.IV cs.AI cs.CV cs.MM physics.med-ph

    Automated interpretation of congenital heart disease from multi-view echocardiograms

    Authors: Jing Wang, Xiaofeng Liu, Fangyun Wang, Lin Zheng, Fengqiao Gao, Hanwen Zhang, Xin Zhang, Wanqing Xie, Binbin Wang

    Abstract: Congenital heart disease (CHD) is the most common birth defect and the leading cause of neonate death in China. Clinical diagnosis can be based on the selected 2D key-frames from five views. Limited by the availability of multi-view data, most methods have to rely on the insufficient single view analysis. This study proposes to automatically analyze the multi-view echocardiograms with a practical… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

    Comments: Published in Medical Image Analysis

    Journal ref: Medical Image Analysis (Volume 69, April 2021, 101942)

  22. arXiv:2311.17624  [pdf, other

    eess.SP cs.NI

    Combating Multi-path Interference to Improve Chirp-based Underwater Acoustic Communication

    Authors: Wenjun Xie, Enqi Zhang, Lizhao You, Deqing Wang, Zhaorui Wang, Liqun Fu

    Abstract: Linear chirp-based underwater acoustic communication has been widely used due to its reliability and long-range transmission capability. However, unlike the counterpart chirp technology in wireless -- LoRa, its throughput is severely limited by the number of modulated chirps in a symbol. The fundamental challenge lies in the underwater multi-path channel, where the delayed copied of one symbol may… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  23. Adaptive Digital Twin for UAV-Assisted Integrated Sensing, Communication, and Computation Networks

    Authors: Bin Li, Wenshuai Liu, Wancheng Xie, Ning Zhang, Yan Zhang

    Abstract: In this paper, we study a digital twin (DT)-empowered integrated sensing, communication, and computation network. Specifically, the users perform radar sensing and computation offloading on the same spectrum, while unmanned aerial vehicles (UAVs) are deployed to provide edge computing service. We first formulate a multi-objective optimization problem to minimize the beampattern performance of mult… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: 14 pages, 11 figures,

    Journal ref: IEEE Transactions on Green Communications and Networking, 2023

  24. arXiv:2310.15371  [pdf, other

    eess.IV cs.AI cs.CV cs.LG physics.med-ph

    Vicinal Feature Statistics Augmentation for Federated 3D Medical Volume Segmentation

    Authors: Yongsong Huang, Wanqing Xie, Mingzhen Li, Mingmei Cheng, Jinzhou Wu, Weixiao Wang, Jane You, Xiaofeng Liu

    Abstract: Federated learning (FL) enables multiple client medical institutes collaboratively train a deep learning (DL) model with privacy protection. However, the performance of FL can be constrained by the limited availability of labeled data in small institutes and the heterogeneous (i.e., non-i.i.d.) data distribution across institutes. Though data augmentation has been a proven technique to boost the g… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: 28th biennial international conference on Information Processing in Medical Imaging (IPMI 2023): Oral Paper

    Journal ref: In: Frangi, A., de Bruijne, M., Wassermann, D., Navab, N. (eds) Information Processing in Medical Imaging. IPMI 2023. Lecture Notes in Computer Science, vol 13939. Springer, Cham

  25. arXiv:2309.11500  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Auto-ACD: A Large-scale Dataset for Audio-Language Representation Learning

    Authors: Luoyi Sun, Xuenan Xu, Mengyue Wu, Weidi Xie

    Abstract: Recently, the AI community has made significant strides in developing powerful foundation models, driven by large-scale multimodal datasets. However, for audio representation learning, existing datasets suffer from limitations in the following aspects: insufficient volume, simplistic content, and arduous collection procedures. To establish an audio dataset with high-quality captions, we propose an… ▽ More

    Submitted 9 September, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: Accepted by ACM MM 2024

  26. arXiv:2309.02576  [pdf, other

    eess.IV cs.CV cs.LG

    Emphysema Subtyping on Thoracic Computed Tomography Scans using Deep Neural Networks

    Authors: Weiyi Xie, Colin Jacobs, Jean-Paul Charbonnier, Dirk Jan Slebos, Bram van Ginneken

    Abstract: Accurate identification of emphysema subtypes and severity is crucial for effective management of COPD and the study of disease heterogeneity. Manual analysis of emphysema subtypes and severity is laborious and subjective. To address this challenge, we present a deep learning-based approach for automating the Fleischner Society's visual score system for emphysema subtyping and severity analysis. W… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Journal ref: Sci Rep. 2023 Aug 29;13(1):14147

  27. arXiv:2308.11980  [pdf, other

    eess.AS cs.SD

    Joint Prediction of Audio Event and Annoyance Rating in an Urban Soundscape by Hierarchical Graph Representation Learning

    Authors: Yuanbo Hou, Siyang Song, Cheng Luo, Andrew Mitchell, Qiaoqiao Ren, Weicheng Xie, Jian Kang, Wenwu Wang, Dick Botteldooren

    Abstract: Sound events in daily life carry rich information about the objective world. The composition of these sounds affects the mood of people in a soundscape. Most previous approaches only focus on classifying and detecting audio events and scenes, but may ignore their perceptual quality that may impact humans' listening mood for the environment, e.g. annoyance. To this end, this paper proposes a novel… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: INTERSPEECH 2023, Code and models: https://github.com/Yuanbo2020/HGRL

  28. arXiv:2307.12717  [pdf, ps, other

    cs.CV eess.IV

    Dense Transformer based Enhanced Coding Network for Unsupervised Metal Artifact Reduction

    Authors: Wangduo Xie, Matthew B. Blaschko

    Abstract: CT images corrupted by metal artifacts have serious negative effects on clinical diagnosis. Considering the difficulty of collecting paired data with ground truth in clinical settings, unsupervised methods for metal artifact reduction are of high interest. However, it is difficult for previous unsupervised methods to retain structural information from CT images while handling the non-local charact… ▽ More

    Submitted 28 July, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

  29. arXiv:2306.02054  [pdf

    eess.AS

    Low-Complexity Acoustic Scene Classification Using Data Augmentation and Lightweight ResNet

    Authors: Yanxiong Li, Wenchang Cao, Wei Xie, Qisheng Huang, Wenfeng Pang, Qianhua He

    Abstract: We present a work on low-complexity acoustic scene classification (ASC) with multiple devices, namely the subtask A of Task 1 of the DCASE2021 challenge. This subtask focuses on classifying audio samples of multiple devices with a low-complexity model, where two main difficulties need to be overcome. First, the audio samples are recorded by different devices, and there is mismatch of recording dev… ▽ More

    Submitted 3 June, 2023; originally announced June 2023.

    Comments: 5 pages, 5 figures, 4 tables. Accepted for publication in the 16th IEEE International Conference on Signal Processing (IEEE ICSP)

  30. arXiv:2306.02053  [pdf

    eess.AS

    Few-shot Class-incremental Audio Classification Using Stochastic Classifier

    Authors: Yanxiong Li, Wenchang Cao, Jialong Li, Wei Xie, Qianhua He

    Abstract: It is generally assumed that number of classes is fixed in current audio classification methods, and the model can recognize pregiven classes only. When new classes emerge, the model needs to be retrained with adequate samples of all classes. If new classes continually emerge, these methods will not work well and even infeasible. In this study, we propose a method for fewshot class-incremental aud… ▽ More

    Submitted 3 June, 2023; originally announced June 2023.

    Comments: 5 pages, 3 figures, 4 tables. Accepted for publication in INTERSPEECH 2023

  31. Few-shot Class-incremental Audio Classification Using Dynamically Expanded Classifier with Self-attention Modified Prototypes

    Authors: Yanxiong Li, Wenchang Cao, Wei Xie, Jialong Li, Emmanouil Benetos

    Abstract: Most existing methods for audio classification assume that the vocabulary of audio classes to be classified is fixed. When novel (unseen) audio classes appear, audio classification systems need to be retrained with abundant labeled samples of all audio classes for recognizing base (initial) and novel audio classes. If novel audio classes continue to appear, the existing methods for audio classific… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: 13 pages, 8 figures, 12 tables. Accepted for publication in IEEE TMM

  32. arXiv:2305.18045  [pdf, ps, other

    cs.SD cs.MM eess.AS

    Few-shot Class-incremental Audio Classification Using Adaptively-refined Prototypes

    Authors: Wei Xie, Yanxiong Li, Qianhua He, Wenchang Cao, Tuomas Virtanen

    Abstract: New classes of sounds constantly emerge with a few samples, making it challenging for models to adapt to dynamic acoustic environments. This challenge motivates us to address the new problem of few-shot class-incremental audio classification. This study aims to enable a model to continuously recognize new classes of sounds with a few training samples of new classes while remembering the learned on… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: 5 pages,2 figures, Accepted by Interspeech 2023

  33. arXiv:2303.10372  [pdf, other

    cs.CV cs.MM eess.IV

    Just Noticeable Visual Redundancy Forecasting: A Deep Multimodal-driven Approach

    Authors: Wuyuan Xie, Shukang Wang, Sukun Tian, Lirong Huang, Ye Liu, Miaohui Wang

    Abstract: Just noticeable difference (JND) refers to the maximum visual change that human eyes cannot perceive, and it has a wide range of applications in multimedia systems. However, most existing JND approaches only focus on a single modality, and rarely consider the complementary effects of multimodal information. In this article, we investigate the JND modeling from an end-to-end homologous multimodal p… ▽ More

    Submitted 18 March, 2023; originally announced March 2023.

    Journal ref: AAAI 2023

  34. Energy Efficient Computation Offloading in Aerial Edge Networks With Multi-Agent Cooperation

    Authors: Wenshuai Liu, Bin Li, Wancheng Xie, Yueyue Dai, Zesong Fei

    Abstract: With the high flexibility of supporting resource-intensive and time-sensitive applications, unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) is proposed as an innovational paradigm to support the mobile users (MUs). As a promising technology, digital twin (DT) is capable of timely mapping the physical entities to virtual models, and reflecting the MEC network state in real-time.… ▽ More

    Submitted 14 February, 2023; originally announced February 2023.

    Comments: 14 pages, 13 figures

  35. arXiv:2301.02228  [pdf, other

    eess.IV cs.CL cs.CV

    MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training in Radiology

    Authors: Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie

    Abstract: In this paper, we consider enhancing medical visual-language pre-training (VLP) with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice. In particular, we make the following contributions: First, unlike existing works that directly process the raw reports, we adopt a novel triplet extraction module to extract the medical-related information,… ▽ More

    Submitted 3 April, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

  36. arXiv:2210.07055  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors

    Authors: Vladimir Iashin, Weidi Xie, Esa Rahtu, Andrew Zisserman

    Abstract: The objective of this paper is audio-visual synchronisation of general videos 'in the wild'. For such videos, the events that may be harnessed for synchronisation cues may be spatially small and may occur only infrequently during a many seconds-long video clip, i.e. the synchronisation signal is 'sparse in space and time'. This contrasts with the case of synchronising videos of talking heads, wher… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Comments: Accepted as a spotlight presentation for the BMVC 2022. Code: https://github.com/v-iashin/SparseSync Project page: https://v-iashin.github.io/SparseSync

  37. arXiv:2209.05477  [pdf, other

    eess.IV cs.CV cs.LG

    Adaptive 3D Localization of 2D Freehand Ultrasound Brain Images

    Authors: Pak-Hei Yeung, Moska Aliasi, Monique Haak, The INTERGROWTH-21st Consortium, Weidi Xie, Ana I. L. Namburete

    Abstract: Two-dimensional (2D) freehand ultrasound is the mainstay in prenatal care and fetal growth monitoring. The task of matching corresponding cross-sectional planes in the 3D anatomy for a given 2D ultrasound brain scan is essential in freehand scanning, but challenging. We propose AdLocUI, a framework that Adaptively Localizes 2D Ultrasound Images in the 3D anatomical atlas without using any external… ▽ More

    Submitted 12 September, 2022; originally announced September 2022.

    Comments: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) 2022

  38. Scale-free and Task-agnostic Attack: Generating Photo-realistic Adversarial Patterns with Patch Quilting Generator

    Authors: Xiangbo Gao, Cheng Luo, Qinliang Lin, Weicheng Xie, Minmin Liu, Linlin Shen, Keerthy Kusumam, Siyang Song

    Abstract: \noindent Traditional L_p norm-restricted image attack algorithms suffer from poor transferability to black box scenarios and poor robustness to defense algorithms. Recent CNN generator-based attack approaches can synthesize unrestricted and semantically meaningful entities to the image, which is shown to be transferable and robust. However, such methods attack images by either synthesizing local… ▽ More

    Submitted 19 November, 2022; v1 submitted 12 August, 2022; originally announced August 2022.

    Journal ref: IEEE Transactions on Information Forensics and Security, Vol. 19, 2024, pp. 4385-4400

  39. arXiv:2206.12772  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation

    Authors: Jinxiang Liu, Chen Ju, Weidi Xie, Ya Zhang

    Abstract: We present a simple yet effective self-supervised framework for audio-visual representation learning, to localize the sound source in videos. To understand what enables to learn useful representations, we systematically investigate the effects of data augmentations, and reveal that (1) composition of data augmentations plays a critical role, i.e. explicitly encouraging the audio-visual representat… ▽ More

    Submitted 15 August, 2022; v1 submitted 25 June, 2022; originally announced June 2022.

    Comments: Camera-ready Version for ACMMM 2022, Project page is https://jinxiang-liu.github.io/SSL-TIE/

  40. arXiv:2206.06947  [pdf, other

    eess.IV cs.CV

    K-Space Transformer for Undersampled MRI Reconstruction

    Authors: Ziheng Zhao, Tianjiao Zhang, Weidi Xie, Yanfeng Wang, Ya Zhang

    Abstract: This paper considers the problem of undersampled MRI reconstruction. We propose a novel Transformer-based framework for directly processing signal in k-space, going beyond the limitation of regular grids as ConvNets do. We adopt an implicit representation of k-space spectrogram, treating spatial coordinates as inputs, and dynamically query the sparsely sampled points to reconstruct the spectrogram… ▽ More

    Submitted 10 November, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

  41. arXiv:2205.03920  [pdf, other

    q-bio.QM eess.SY

    From Discovery to Production: Challenges and Novel Methodologies for Next Generation Biomanufacturing

    Authors: Wei Xie, Giulia Pedrielli

    Abstract: The increasingly pressing demand of novel drugs (e.g., gene therapies for personalized cancer care, ever evolving vaccines) with unprecedented levels of personalization, has put a remarkable pressure on the traditionally long time required by the pharma R&D and manufacturing to go from design to production of new products. The revolution has already brought important changes in the technologies us… ▽ More

    Submitted 28 June, 2022; v1 submitted 8 May, 2022; originally announced May 2022.

    Comments: 15 pages, 5 figures

  42. arXiv:2205.03229  [pdf

    eess.SP physics.optics

    Multi-core fiber enabled fading noise suppression in φ-OFDR based quantitative distributed vibration sensing

    Authors: Yuxiang Feng, Weilin Xie, Yinxia Meng, Jiang Yang, Qiang Yang, Yan Ren, Tianwai Bo, Zhongwei Tan, Wei Wei, Yi Dong

    Abstract: Coherent fading has been regarded as a critical issue in phase-sensitive optical frequency domain reflectometry (φ-OFDR) based distributed fiber-optic sensing. Here, we report on an approach for fading noise suppression in φ-OFDR with multi-core fiber. By exploiting the independent nature of the randomness in the distribution of reflective index in each of the cores, the drastic phase fluctuations… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

    Comments: 4 pages

  43. arXiv:2203.08980  [pdf, other

    stat.ME eess.SY

    Stochastic Simulation Uncertainty Analysis to Accelerate Flexible Biomanufacturing Process Development

    Authors: Wei Xie, Russell R. Barton, Barry L. Nelson, Keqi Wang

    Abstract: Motivated by critical challenges and needs from biopharmaceuticals manufacturing, we propose a general metamodel-assisted stochastic simulation uncertainty analysis framework to accelerate the development of a simulation model with modular design for flexible production processes. There are often very limited process observations. Thus, there exist both simulation and model uncertainties in the sy… ▽ More

    Submitted 3 September, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

    Comments: 32 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:2011.04207

  44. arXiv:2201.03116  [pdf, other

    eess.SY cs.LG

    Opportunities of Hybrid Model-based Reinforcement Learning for Cell Therapy Manufacturing Process Control

    Authors: Hua Zheng, Wei Xie, Keqi Wang, Zheng Li

    Abstract: Driven by the key challenges of cell therapy manufacturing, including high complexity, high uncertainty, and very limited process observations, we propose a hybrid model-based reinforcement learning (RL) to efficiently guide process control. We first create a probabilistic knowledge graph (KG) hybrid model characterizing the risk- and science-based understanding of biomanufacturing process mechani… ▽ More

    Submitted 25 January, 2022; v1 submitted 9 January, 2022; originally announced January 2022.

    Comments: 14 pages, 2 figures

  45. arXiv:2112.07948  [pdf, other

    cs.CV eess.IV

    Transcoded Video Restoration by Temporal Spatial Auxiliary Network

    Authors: Li Xu, Gang He, Jinjia Zhou, Jie Lei, Weiying Xie, Yunsong Li, Yu-Wing Tai

    Abstract: In most video platforms, such as Youtube, and TikTok, the played videos usually have undergone multiple video encodings such as hardware encoding by recording devices, software encoding by video editing apps, and single/multiple video transcoding by video application servers. Previous works in compressed video restoration typically assume the compression artifacts are caused by one-time encoding.… ▽ More

    Submitted 15 December, 2021; originally announced December 2021.

    Comments: Accepted by AAAI2022

  46. arXiv:2112.04432  [pdf, other

    cs.CV eess.AS

    Audio-Visual Synchronisation in the wild

    Authors: Honglie Chen, Weidi Xie, Triantafyllos Afouras, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman

    Abstract: In this paper, we consider the problem of audio-visual synchronisation applied to videos `in-the-wild' (ie of general classes beyond speech). As a new task, we identify and curate a test set with high audio-visual correlation, namely VGG-Sound Sync. We compare a number of transformer-based architectural variants specifically designed to model audio and visual signals of arbitrary length, while sig… ▽ More

    Submitted 8 December, 2021; originally announced December 2021.

  47. arXiv:2109.12108  [pdf, other

    eess.IV cs.CV

    ImplicitVol: Sensorless 3D Ultrasound Reconstruction with Deep Implicit Representation

    Authors: Pak-Hei Yeung, Linde Hesse, Moska Aliasi, Monique Haak, the INTERGROWTH-21st Consortium, Weidi Xie, Ana I. L. Namburete

    Abstract: The objective of this work is to achieve sensorless reconstruction of a 3D volume from a set of 2D freehand ultrasound images with deep implicit representation. In contrast to the conventional way that represents a 3D volume as a discrete voxel grid, we do so by parameterizing it as the zero level-set of a continuous function, i.e. implicitly representing the 3D volume as a mapping from the spatia… ▽ More

    Submitted 24 September, 2021; originally announced September 2021.

  48. arXiv:2107.13431  [pdf

    eess.IV cs.CV

    AI assisted method for efficiently generating breast ultrasound screening reports

    Authors: Shuang Ge, Qiongyu Ye, Wenquan Xie, Desheng Sun, Huabin Zhang, Xiaobo Zhou, Kehong Yuan

    Abstract: Background: Ultrasound is one of the preferred choices for early screening of dense breast cancer. Clinically, doctors have to manually write the screening report which is time-consuming and laborious, and it is easy to miss and miswrite. Aim: We proposed a new pipeline to automatically generate AI breast ultrasound screening reports based on ultrasound images, aiming to assist doctors in improvin… ▽ More

    Submitted 22 May, 2022; v1 submitted 28 July, 2021; originally announced July 2021.

  49. arXiv:2106.01351  [pdf, other

    eess.IV cs.CV

    Deep Clustering Activation Maps for Emphysema Subtyping

    Authors: Weiyi Xie, Colin Jacobs, Bram van Ginneken

    Abstract: We propose a deep learning clustering method that exploits dense features from a segmentation network for emphysema subtyping from computed tomography (CT) scans. Using dense features enables high-resolution visualization of image regions corresponding to the cluster assignment via dense clustering activation maps (dCAMs). This approach provides model interpretability. We evaluated clustering resu… ▽ More

    Submitted 1 June, 2021; originally announced June 2021.

  50. arXiv:2105.11748  [pdf, other

    eess.IV cs.CV cs.LG

    Dense Regression Activation Maps For Lesion Segmentation in CT scans of COVID-19 patients

    Authors: Weiyi Xie, Colin Jacobs, Jean-Paul Charbonnier, Bram van Ginneken

    Abstract: Automatic lesion segmentation on thoracic CT enables rapid quantitative analysis of lung involvement in COVID-19 infections. However, obtaining a large amount of voxel-level annotations for training segmentation networks is prohibitively expensive. Therefore, we propose a weakly-supervised segmentation method based on dense regression activation maps (dRAMs). Most weakly-supervised segmentation ap… ▽ More

    Submitted 18 November, 2021; v1 submitted 25 May, 2021; originally announced May 2021.