Skip to main content

Showing 1–50 of 115 results for author: Tang, X

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.01243  [pdf, ps, other

    eess.SP

    Energy-Efficient Integrated Communication and Computation via Non-Terrestrial Networks with Uncertainty Awareness

    Authors: Xiao Tang, Yudan Jiang, Ruonan Zhang, Qinghe Du, Jinxin Liu, Naijin Liu

    Abstract: Non-terrestrial network (NTN)-based integrated communication and computation empowers various emerging applications with global coverage. Yet this vision is severely challenged by the energy issue given the limited energy supply of NTN nodes and the energy-consuming nature of communication and computation. In this paper, we investigate the energy-efficient integrated communication and computation… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: Accepted @ IEEE IoTJ

  2. arXiv:2505.17543  [pdf, ps, other

    cs.SD cs.MM eess.AS

    MEGADance: Mixture-of-Experts Architecture for Genre-Aware 3D Dance Generation

    Authors: Kaixing Yang, Xulong Tang, Ziqiao Peng, Yuxuan Hu, Jun He, Hongyan Liu

    Abstract: Music-driven 3D dance generation has attracted increasing attention in recent years, with promising applications in choreography, virtual reality, and creative content creation. Previous research has generated promising realistic dance movement from audio signals. However, traditional methods underutilize genre conditioning, often treating it as auxiliary modifiers rather than core semantic driver… ▽ More

    Submitted 31 May, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

    Comments: arXiv admin note: text overlap with arXiv:2505.14222

  3. arXiv:2505.14222  [pdf, other

    cs.SD cs.GR cs.MM eess.AS

    MatchDance: Collaborative Mamba-Transformer Architecture Matching for High-Quality 3D Dance Synthesis

    Authors: Kaixing Yang, Xulong Tang, Yuxuan Hu, Jiahao Yang, Hongyan Liu, Qinnan Zhang, Jun He, Zhaoxin Fan

    Abstract: Music-to-dance generation represents a challenging yet pivotal task at the intersection of choreography, virtual reality, and creative content generation. Despite its significance, existing methods face substantial limitation in achieving choreographic consistency. To address the challenge, we propose MatchDance, a novel framework for music-to-dance generation that constructs a latent representati… ▽ More

    Submitted 21 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

  4. arXiv:2505.11516  [pdf, other

    cs.RO eess.IV

    SELECT: A Submodular Approach for Active LiDAR Semantic Segmentation

    Authors: Ruiyu Mao, Sarthak Kumar Maharana, Xulong Tang, Yunhui Guo

    Abstract: LiDAR-based semantic segmentation plays a vital role in autonomous driving by enabling detailed understanding of 3D environments. However, annotating LiDAR point clouds is extremely costly and requires assigning semantic labels to millions of points with complex geometric structures. Active Learning (AL) has emerged as a promising approach to reduce labeling costs by querying only the most informa… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  5. arXiv:2505.11248  [pdf, ps, other

    eess.SP cs.IT

    Unfolded Deep Graph Learning for Networked Over-the-Air Computation

    Authors: Xiao Tang, Huirong Xiao, Chao Shen, Li Sun, Qinghe Du, Dusit Niyato, Zhu Han

    Abstract: Over-the-air computation (AirComp) has emerged as a promising technology that enables simultaneous transmission and computation through wireless channels. In this paper, we investigate the networked AirComp in multiple clusters allowing diversified data computation, which is yet challenged by the transceiver coordination and interference management therein. Particularly, we aim to maximize the mul… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: Accepted @ IEEE TWC

  6. arXiv:2504.01446  [pdf, other

    eess.SP cs.IT

    Deep Graph Reinforcement Learning for UAV-Enabled Multi-User Secure Communications

    Authors: Xiao Tang, Kexin Zhao, Chao Shen, Qinghe Du, Yichen Wang, Dusit Niyato, Zhu Han

    Abstract: While unmanned aerial vehicles (UAVs) with flexible mobility are envisioned to enhance physical layer security in wireless communications, the efficient security design that adapts to such high network dynamics is rather challenging. The conventional approaches extended from optimization perspectives are usually quite involved, especially when jointly considering factors in different scales such a… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

    Comments: Accepted at IEEE TMC

  7. arXiv:2503.20499  [pdf, other

    cs.SD eess.AS

    FireRedTTS-1S: An Upgraded Streamable Foundation Text-to-Speech System

    Authors: Hao-Han Guo, Yao Hu, Fei-Yu Shen, Xu Tang, Yi-Chen Wu, Feng-Long Xie, Kun Xie

    Abstract: In this work, we upgrade FireRedTTS to a new version, FireRedTTS-1S, a high-quality streaming foundation text-to-speech system. FireRedTTS-1S achieves streaming speech generation via two steps: text-to-semantic decoding and semantic-to-acoustic decoding. In text-to-semantic decoding, a semantic-aware speech tokenizer converts the speech signal into semantic tokens, which can be synthesized from th… ▽ More

    Submitted 26 May, 2025; v1 submitted 26 March, 2025; originally announced March 2025.

  8. arXiv:2503.09631  [pdf, other

    cs.GR eess.IV

    V2M4: 4D Mesh Animation Reconstruction from a Single Monocular Video

    Authors: Jianqi Chen, Biao Zhang, Xiangjun Tang, Peter Wonka

    Abstract: We present V2M4, a novel 4D reconstruction method that directly generates a usable 4D mesh animation asset from a single monocular video. Unlike existing approaches that rely on priors from multi-view image and video generation models, our method is based on native 3D mesh generation models. Naively applying 3D mesh generation models to generate a mesh for each frame in a 4D task can lead to issue… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: Project page:https://windvchen.github.io/V2M4/

  9. arXiv:2502.12180  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    ClusMFL: A Cluster-Enhanced Framework for Modality-Incomplete Multimodal Federated Learning in Brain Imaging Analysis

    Authors: Xinpeng Wang, Rong Zhou, Han Xie, Xiaoying Tang, Lifang He, Carl Yang

    Abstract: Multimodal Federated Learning (MFL) has emerged as a promising approach for collaboratively training multimodal models across distributed clients, particularly in healthcare domains. In the context of brain imaging analysis, modality incompleteness presents a significant challenge, where some institutions may lack specific imaging modalities (e.g., PET, MRI, or CT) due to privacy concerns, device… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

  10. arXiv:2502.07012  [pdf, ps, other

    eess.SP

    Bayesian Beamforming for Integrated Sensing and Communication Systems

    Authors: Zongyao Zhao, Zhenyu Liu, Wei Dai, Xinke Tang, Xiao-Ping Zhang, Yuhan Dong

    Abstract: The uncertainty of the sensing target brings great challenge to the beamforming design of the integrated sensing and communication (ISAC) system. To address this issue, we model the scattering coefficient and azimuth angle of the target as random variables and introduce a novel metric, expected detection probability (EPd), to quantify the average detection performance from a Bayesian perspective.… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

    Comments: 6 pages, 6 figures

  11. arXiv:2501.14350  [pdf, other

    eess.AS cs.SD

    FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration

    Authors: Kai-Tuo Xu, Feng-Long Xie, Xu Tang, Yao Hu

    Abstract: We present FireRedASR, a family of large-scale automatic speech recognition (ASR) models for Mandarin, designed to meet diverse requirements in superior performance and optimal efficiency across various applications. FireRedASR comprises two variants: FireRedASR-LLM: Designed to achieve state-of-the-art (SOTA) performance and to enable seamless end-to-end speech interaction. It adopts an Encoder… ▽ More

    Submitted 24 January, 2025; originally announced January 2025.

  12. arXiv:2501.14264  [pdf, other

    eess.IV cs.CV

    CDI: Blind Image Restoration Fidelity Evaluation based on Consistency with Degraded Image

    Authors: Xiaojun Tang, Jingru Wang, Guangwei Huang, Guannan Chen, Rui Zheng, Lian Huai, Yuyu Liu, Xingqun Jiang

    Abstract: Recent advancements in Blind Image Restoration (BIR) methods, based on Generative Adversarial Networks and Diffusion Models, have significantly improved visual quality. However, they present significant challenges for Image Quality Assessment (IQA), as the existing Full-Reference IQA methods often rate images with high perceptual quality poorly. In this paper, we reassess the Solution Non-Uniquene… ▽ More

    Submitted 24 January, 2025; originally announced January 2025.

  13. Continual Test-Time Adaptation for Single Image Defocus Deblurring via Causal Siamese Networks

    Authors: Shuang Cui, Yi Li, Jiangmeng Li, Xiongxin Tang, Bing Su, Fanjiang Xu, Hui Xiong

    Abstract: Single image defocus deblurring (SIDD) aims to restore an all-in-focus image from a defocused one. Distribution shifts in defocused images generally lead to performance degradation of existing methods during out-of-distribution inferences. In this work, we gauge the intrinsic reason behind the performance degradation, which is identified as the heterogeneity of lens-specific point spread functions… ▽ More

    Submitted 23 February, 2025; v1 submitted 15 January, 2025; originally announced January 2025.

    Journal ref: International Journal of Computer Vision 2025

  14. arXiv:2412.20418  [pdf, other

    eess.IV cs.CV

    Diff4MMLiTS: Advanced Multimodal Liver Tumor Segmentation via Diffusion-Based Image Synthesis and Alignment

    Authors: Shiyun Chen, Li Lin, Pujin Cheng, ZhiCheng Jin, JianJian Chen, HaiDong Zhu, Kenneth K. Y. Wong, Xiaoying Tang

    Abstract: Multimodal learning has been demonstrated to enhance performance across various clinical tasks, owing to the diverse perspectives offered by different modalities of data. However, existing multimodal segmentation methods rely on well-registered multimodal data, which is unrealistic for real-world clinical images, particularly for indistinct and diffuse regions such as liver tumors. In this paper,… ▽ More

    Submitted 29 December, 2024; originally announced December 2024.

  15. arXiv:2412.19123  [pdf, other

    cs.SD cs.MM eess.AS

    CoheDancers: Enhancing Interactive Group Dance Generation through Music-Driven Coherence Decomposition

    Authors: Kaixing Yang, Xulong Tang, Haoyu Wu, Qinliang Xue, Biao Qin, Hongyan Liu, Zhaoxin Fan

    Abstract: Dance generation is crucial and challenging, particularly in domains like dance performance and virtual gaming. In the current body of literature, most methodologies focus on Solo Music2Dance. While there are efforts directed towards Group Music2Dance, these often suffer from a lack of coherence, resulting in aesthetically poor dance performances. Thus, we introduce CoheDancers, a novel framework… ▽ More

    Submitted 26 December, 2024; originally announced December 2024.

  16. arXiv:2412.18749  [pdf, other

    eess.SY

    RIS-Assisted Simultaneous Legitimate Monitoring and Jamming for Industrial Wireless Networks

    Authors: Likang Zhang, Qinghe Du, Yijing Ren, Xiao Tang, Maged Elkashlan, Zhu Han

    Abstract: In this paper, we study reconfigurable intelligent surface (RIS)-assisted simultaneous legitimate monitoring and jamming techniques for industrial environments, so that egitimate monitor (LM) and legitimate jammers (LJs) can sustainably monitor and interfere with suspicious communications with minimum transmission power. Specifically, we propose a Block Coordinate Descent-Particle Swarm Optimizati… ▽ More

    Submitted 24 December, 2024; originally announced December 2024.

  17. Real-Time AIoT for UAV Antenna Interference Detection via Edge-Cloud Collaboration

    Authors: Jun Dong, Jintao Cheng, Jin Wu, Chengxi Zhang, Shunyi Zhao, Xiaoyu Tang

    Abstract: In the fifth-generation (5G) era, eliminating communication interference sources is crucial for maintaining network performance. Interference often originates from unauthorized or malfunctioning antennas, and radio monitoring agencies must address numerous sources of such antennas annually. Unmanned aerial vehicles (UAVs) can improve inspection efficiency. However, the data transmission delay in t… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

  18. arXiv:2411.16380  [pdf, other

    eess.IV cs.AI cs.CV

    Privacy-Preserving Federated Foundation Model for Generalist Ultrasound Artificial Intelligence

    Authors: Yuncheng Jiang, Chun-Mei Feng, Jinke Ren, Jun Wei, Zixun Zhang, Yiwen Hu, Yunbi Liu, Rui Sun, Xuemei Tang, Juan Du, Xiang Wan, Yong Xu, Bo Du, Xin Gao, Guangyu Wang, Shaohua Zhou, Shuguang Cui, Rick Siow Mong Goh, Yong Liu, Zhen Li

    Abstract: Ultrasound imaging is widely used in clinical diagnosis due to its non-invasive nature and real-time capabilities. However, conventional ultrasound diagnostics face several limitations, including high dependence on physician expertise and suboptimal image quality, which complicates interpretation and increases the likelihood of diagnostic errors. Artificial intelligence (AI) has emerged as a promi… ▽ More

    Submitted 25 November, 2024; originally announced November 2024.

  19. arXiv:2411.06720  [pdf, other

    cs.LG eess.SP

    Real-time Monitoring and Analysis of Track and Field Athletes Based on Edge Computing and Deep Reinforcement Learning Algorithm

    Authors: Xiaowei Tang, Bin Long, Li Zhou

    Abstract: This research focuses on real-time monitoring and analysis of track and field athletes, addressing the limitations of traditional monitoring systems in terms of real-time performance and accuracy. We propose an IoT-optimized system that integrates edge computing and deep learning algorithms. Traditional systems often experience delays and reduced accuracy when handling complex motion data, whereas… ▽ More

    Submitted 11 November, 2024; originally announced November 2024.

    Comments: 17 pages

  20. arXiv:2411.02951  [pdf, other

    eess.IV cs.CV

    LDPM: Towards undersampled MRI reconstruction with MR-VAE and Latent Diffusion Prior

    Authors: Xingjian Tang, Jingwei Guan, Linge Li, Ran Shi, Youmei Zhang, Mengye Lyu, Li Yan

    Abstract: Diffusion models, as powerful generative models, have found a wide range of applications and shown great potential in solving image reconstruction problems. Some works attempted to solve MRI reconstruction with diffusion models, but these methods operate directly in pixel space, leading to higher computational costs for optimization and inference. Latent diffusion models, pre-trained on natural im… ▽ More

    Submitted 5 March, 2025; v1 submitted 5 November, 2024; originally announced November 2024.

  21. arXiv:2411.00433  [pdf, ps, other

    eess.SP

    Joint Beamforming for Multi-target Detection and Multi-user Communication in ISAC Systems

    Authors: Zongyao Zhao, Zhenyu Liu, Rui Jiang, Zhongyi Li, Xiao-Ping Zhang, Xinke Tang, Yuhan Dong

    Abstract: Detecting weak targets is one of the main challenges for integrated sensing and communication (ISAC) systems. Sensing and communication suffer from a performance trade-off in ISAC systems. As the communication demand increases, sensing ability, especially weak target detection performance, will inevitably reduce. Traditional approaches fail to address this issue. In this paper, we develop a joint… ▽ More

    Submitted 1 November, 2024; originally announced November 2024.

    Comments: 5 pages, 4 figures, submitted to IEEE journal

  22. arXiv:2410.20344  [pdf, other

    eess.SP cs.IT

    Deep Learning-Assisted Jamming Mitigation with Movable Antenna Array

    Authors: Xiao Tang, Yudan Jiang, Jinxin Liu, Qinghe Du, Dusit Niyato, Zhu Han

    Abstract: This paper reveals the potential of movable antennas in enhancing anti-jamming communication. We consider a legitimate communication link in the presence of multiple jammers and propose deploying a movable antenna array at the receiver to combat jamming attacks. We formulate the problem as a signal-to-interference-plus-noise ratio maximization, by jointly optimizing the receive beamforming and ant… ▽ More

    Submitted 4 April, 2025; v1 submitted 27 October, 2024; originally announced October 2024.

    Comments: Accepted @ IEEE TVT

  23. arXiv:2409.03283  [pdf, other

    cs.SD eess.AS

    FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications

    Authors: Hao-Han Guo, Yao Hu, Kun Liu, Fei-Yu Shen, Xu Tang, Yi-Chen Wu, Feng-Long Xie, Kun Xie, Kai-Tuo Xu

    Abstract: This work proposes FireRedTTS, a foundation text-to-speech framework, to meet the growing demands for personalized and diverse generative speech applications. The framework comprises three parts: data processing, foundation system, and downstream applications. First, we comprehensively present our data processing pipeline, which transforms massive raw audio into a large-scale high-quality TTS data… ▽ More

    Submitted 11 April, 2025; v1 submitted 5 September, 2024; originally announced September 2024.

  24. arXiv:2409.02797  [pdf, ps, other

    eess.SP

    Joint Beamforming for Backscatter Integrated Sensing and Communication

    Authors: Zongyao Zhao, Tiankuo Wei, Zhenyu Liu, Xinke Tang, Xiao-Ping Zhang, Yuhan Dong

    Abstract: Integrated sensing and communication (ISAC) is a key technology of next generation wireless communication. Backscatter communication (BackCom) plays an important role for internet of things (IoT). Then the integration of ISAC with BackCom technology enables low-power data transmission while enhancing the system sensing ability, which is expected to provide a potentially revolutionary solution for… ▽ More

    Submitted 4 September, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

    Comments: 6 pages, 4 figures, IEEE Global Communications Conference (Globecom) 2024. This paper is the conference version of the following work: arXiv:2407.19235

  25. arXiv:2409.02421  [pdf, other

    cs.SD eess.AS

    MusicMamba: A Dual-Feature Modeling Approach for Generating Chinese Traditional Music with Modal Precision

    Authors: Jiatao Chen, Tianming Xie, Xing Tang, Jing Wang, Wenjing Dong, Bing Shi

    Abstract: In recent years, deep learning has significantly advanced the MIDI domain, solidifying music generation as a key application of artificial intelligence. However, existing research primarily focuses on Western music and encounters challenges in generating melodies for Chinese traditional music, especially in capturing modal characteristics and emotional expression. To address these issues, we propo… ▽ More

    Submitted 5 March, 2025; v1 submitted 4 September, 2024; originally announced September 2024.

    Comments: Accepted by ICASSP 2025

  26. arXiv:2408.13061  [pdf, other

    eess.IV physics.optics

    General Intelligent Imaging and Uncertainty Quantification by Deterministic Diffusion Model

    Authors: Weiru Fan, Xiaobin Tang, Yiyi Liao, Da-Wei Wang

    Abstract: Computational imaging is crucial in many disciplines from autonomous driving to life sciences. However, traditional model-driven and iterative methods consume large computational power and lack scalability for imaging. Deep learning (DL) is effective in processing local-to-local patterns, but it struggles with handling universal global-to-local (nonlocal) patterns under current frameworks. To brid… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  27. arXiv:2408.10067  [pdf, other

    eess.IV cs.CV

    Towards a Benchmark for Colorectal Cancer Segmentation in Endorectal Ultrasound Videos: Dataset and Model Development

    Authors: Yuncheng Jiang, Yiwen Hu, Zixun Zhang, Jun Wei, Chun-Mei Feng, Xuemei Tang, Xiang Wan, Yong Liu, Shuguang Cui, Zhen Li

    Abstract: Endorectal ultrasound (ERUS) is an important imaging modality that provides high reliability for diagnosing the depth and boundary of invasion in colorectal cancer. However, the lack of a large-scale ERUS dataset with high-quality annotations hinders the development of automatic ultrasound diagnostics. In this paper, we collected and annotated the first benchmark dataset that covers diverse ERUS s… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  28. arXiv:2407.19235  [pdf, ps, other

    eess.SP eess.SY

    B-ISAC: Backscatter Integrated Sensing and Communication for IoE Applications

    Authors: Zongyao Zhao, Yuhan Dong, Tiankuo Wei, Xinke Tang, Xiao-Ping Zhang, Zhenyu Liu

    Abstract: The integration of backscatter communication (BackCom) technology with integrated sensing and communication (ISAC) technology not only enhances the system sensing performance, but also enables low-power information transmission. This is expected to provide a new paradigm for communication and sensing in internet of everything (IoE) applications. In this paper, we propose a novel cognitive wireless… ▽ More

    Submitted 1 November, 2024; v1 submitted 27 July, 2024; originally announced July 2024.

    Comments: 15 pages, 12 figures, submitted to IEEE Journal, This paper is the Journal version of the following paper: arXiv:2409.02797

  29. Automatic modulation classification for MIMO system based on the mutual information feature extraction

    Authors: N. Ussipov, S. Akhtanov, Z. Zhanabaev, D. Turlykozhayeva, B. Karibayev, T. Namazbayev, D. Almen, A. Akhmetali, X. Tang

    Abstract: Automatic Modulation Classification (AMC) is an essential technology that is widely applied into various communications scenarios. In recent years, many Machine Learning and Deep-Learning methods have been introduced into AMC, and a lot of them apply different approaches to eliminate interference in complex Multiple-Input and Multiple-Output (MIMO) signals and improve classification performance. H… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: IEEE Access (2024)

    Report number: IEEE Access, vol. 12, pp. 68463-68470, 2024

    Journal ref: IEEE Access, vol. 12, pp. 68463-68470, 2024

  30. arXiv:2406.02014  [pdf, other

    q-bio.NC cs.LG cs.SD eess.AS

    Understanding Auditory Evoked Brain Signal via Physics-informed Embedding Network with Multi-Task Transformer

    Authors: Wanli Ma, Xuegang Tang, Jin Gu, Ying Wang, Yuling Xia

    Abstract: In the fields of brain-computer interaction and cognitive neuroscience, effective decoding of auditory signals from task-based functional magnetic resonance imaging (fMRI) is key to understanding how the brain processes complex auditory information. Although existing methods have enhanced decoding capabilities, limitations remain in information utilization and model representation. To overcome the… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  31. arXiv:2405.17295  [pdf, other

    eess.SP

    In-sensor Computing ANN Capacitive Sensors

    Authors: Guihua Zhao, Yating Peng, Jiaxin Zhu, Xin Tang, Zhiyi Yu

    Abstract: This letter proposes an in-sensor computing multiply-and-accumulate (MAC) circuit based on capacitance. The MAC circuits can constitute an artificial neural network(ANN) layer and be operated as ANN classifiers and autoencoders. The proposed circuit is a promising scheme for capacitive ANN image sensors, showing competitively high efficiency and lower power.

    Submitted 27 May, 2024; originally announced May 2024.

  32. arXiv:2405.07202  [pdf, other

    cs.CV cs.AI cs.LG cs.MM cs.SD eess.AS

    Unified Video-Language Pre-training with Synchronized Audio

    Authors: Shentong Mo, Haofan Wang, Huaxia Li, Xu Tang

    Abstract: Video-language pre-training is a typical and challenging problem that aims at learning visual and textual representations from large-scale data in a self-supervised way. Existing pre-training approaches either captured the correspondence of image-text pairs or utilized temporal ordering of frames. However, they do not explicitly explore the natural synchronization between audio and the other two m… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  33. arXiv:2403.13346  [pdf, other

    eess.SY

    A Control-Recoverable Added-Noise-based Privacy Scheme for LQ Control in Networked Control Systems

    Authors: Xuening Tang, Xianghui Cao, Wei Xing Zheng

    Abstract: As networked control systems continue to evolve, ensuring the privacy of sensitive data becomes an increasingly pressing concern, especially in situations where the controller is physically separated from the plant. In this paper, we propose a secure control scheme for computing linear quadratic control in a networked control system utilizing two networked controllers, a privacy encoder and a cont… ▽ More

    Submitted 20 October, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  34. Object Segmentation-Assisted Inter Prediction for Versatile Video Coding

    Authors: Zhuoyuan Li, Zikun Yuan, Li Li, Dong Liu, Xiaohu Tang, Feng Wu

    Abstract: In modern video coding standards, block-based inter prediction is widely adopted, which brings high compression efficiency. However, in natural videos, there are usually multiple moving objects of arbitrary shapes, resulting in complex motion fields that are difficult to represent compactly. This problem has been tackled by more flexible block partitioning methods in the Versatile Video Coding (VV… ▽ More

    Submitted 12 September, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: 20 pages, 13 figures, accepted by IEEE Transactions on Broadcasting (TBC)

  35. arXiv:2403.04743  [pdf, other

    eess.AS

    Speech Emotion Recognition Via CNN-Transformer and Multidimensional Attention Mechanism

    Authors: Xiaoyu Tang, Yixin Lin, Ting Dang, Yuanfang Zhang, Jintao Cheng

    Abstract: Speech Emotion Recognition (SER) is crucial in human-machine interactions. Mainstream approaches utilize Convolutional Neural Networks or Recurrent Neural Networks to learn local energy feature representations of speech segments from speech information, but struggle with capturing global information such as the duration of energy in speech. Some use Transformers to capture global information, but… ▽ More

    Submitted 4 June, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

  36. arXiv:2402.17502  [pdf, other

    cs.CV eess.IV

    FedLPPA: Learning Personalized Prompt and Aggregation for Federated Weakly-supervised Medical Image Segmentation

    Authors: Li Lin, Yixiang Liu, Jiewei Wu, Pujin Cheng, Zhiyuan Cai, Kenneth K. Y. Wong, Xiaoying Tang

    Abstract: Federated learning (FL) effectively mitigates the data silo challenge brought about by policies and privacy concerns, implicitly harnessing more data for deep model training. However, traditional centralized FL models grapple with diverse multi-center data, especially in the face of significant data heterogeneity, notably in medical contexts. In the realm of medical image segmentation, the growing… ▽ More

    Submitted 31 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: 12 pages, 10 figures

  37. arXiv:2402.09752  [pdf

    physics.optics eess.SY physics.app-ph quant-ph

    Vector spectrometer with Hertz-level resolution and super-recognition capability

    Authors: Ting Qing, Shupeng Li, Huashan Yang, Lihan Wang, Yijie Fang, Xiaohu Tang, Meihui Cao, Jianming Lu, Jijun He, Junqiu Liu, Yueguang Lyu, Shilong Pan

    Abstract: High-resolution optical spectrometers are crucial in revealing intricate characteristics of signals, determining laser frequencies, measuring physical constants, identifying substances, and advancing biosensing applications. Conventional spectrometers, however, often grapple with inherent trade-offs among spectral resolution, wavelength range, and accuracy. Furthermore, even at high resolution, re… ▽ More

    Submitted 6 March, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: 21 pages, 6 figures

  38. arXiv:2312.07226  [pdf, other

    eess.IV cs.CV

    Super-Resolution on Rotationally Scanned Photoacoustic Microscopy Images Incorporating Scanning Prior

    Authors: Kai Pan, Linyang Li, Li Lin, Pujin Cheng, Junyan Lyu, Lei Xi, Xiaoyin Tang

    Abstract: Photoacoustic Microscopy (PAM) images integrating the advantages of optical contrast and acoustic resolution have been widely used in brain studies. However, there exists a trade-off between scanning speed and image resolution. Compared with traditional raster scanning, rotational scanning provides good opportunities for fast PAM imaging by optimizing the scanning mechanism. Recently, there is a t… ▽ More

    Submitted 5 March, 2025; v1 submitted 12 December, 2023; originally announced December 2023.

  39. arXiv:2312.01726  [pdf, other

    eess.IV cs.CV

    Simultaneous Alignment and Surface Regression Using Hybrid 2D-3D Networks for 3D Coherent Layer Segmentation of Retinal OCT Images with Full and Sparse Annotations

    Authors: Hong Liu, Dong Wei, Donghuan Lu, Xiaoying Tang, Liansheng Wang, Yefeng Zheng

    Abstract: Layer segmentation is important to quantitative analysis of retinal optical coherence tomography (OCT). Recently, deep learning based methods have been developed to automate this task and yield remarkable performance. However, due to the large spatial gap and potential mismatch between the B-scans of an OCT volume, all of them were based on 2D segmentation of individual B-scans, which may lose the… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: Accepted by MIA. arXiv admin note: text overlap with arXiv:2203.02390

  40. arXiv:2312.01544  [pdf, other

    cs.LG cs.AI eess.SY

    KEEC: Koopman Embedded Equivariant Control

    Authors: Xiaoyuan Cheng, Yiming Yang, Xiaohang Tang, Wei Jiang, Yukun Hu

    Abstract: An efficient way to control systems with unknown nonlinear dynamics is to find an appropriate embedding or representation for simplified approximation (e.g. linearization), which facilitates system identification and control synthesis. Nevertheless, there has been a lack of embedding methods that can guarantee (i) embedding the dynamical system comprehensively, including the vector fields (ODE for… ▽ More

    Submitted 27 February, 2025; v1 submitted 3 December, 2023; originally announced December 2023.

  41. arXiv:2311.10641  [pdf

    physics.med-ph eess.IV

    Image-Domain Material Decomposition for Dual-energy CT using Unsupervised Learning with Data-fidelity Loss

    Authors: Junbo Peng, Chih-Wei Chang, Huiqiao Xie, Richard L. J. Qiu, Justin Roper, Tonghe Wang, Beth Bradshaw, Xiangyang Tang, Xiaofeng Yang

    Abstract: Background: Dual-energy CT (DECT) and material decomposition play vital roles in quantitative medical imaging. However, the decomposition process may suffer from significant noise amplification, leading to severely degraded image signal-to-noise ratios (SNRs). While existing iterative algorithms perform noise suppression using different image priors, these heuristic image priors cannot accurately… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

  42. arXiv:2310.10300  [pdf, other

    cs.SD cs.IR eess.AS

    BeatDance: A Beat-Based Model-Agnostic Contrastive Learning Framework for Music-Dance Retrieval

    Authors: Kaixing Yang, Xukun Zhou, Xulong Tang, Ran Diao, Hongyan Liu, Jun He, Zhaoxin Fan

    Abstract: Dance and music are closely related forms of expression, with mutual retrieval between dance videos and music being a fundamental task in various fields like education, art, and sports. However, existing methods often suffer from unnatural generation effects or fail to fully explore the correlation between music and dance. To overcome these challenges, we propose BeatDance, a novel beat-based mode… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  43. arXiv:2310.09071  [pdf, other

    cs.LG eess.SY

    Online Relocating and Matching of Ride-Hailing Services: A Model-Based Modular Approach

    Authors: Chang Gao, Xi Lin, Fang He, Xindi Tang

    Abstract: This study proposes an innovative model-based modular approach (MMA) to dynamically optimize order matching and vehicle relocation in a ride-hailing platform. MMA utilizes a two-layer and modular modeling structure. The upper layer determines the spatial transfer patterns of vehicle flow within the system to maximize the total revenue of the current and future stages. With the guidance provided by… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  44. arXiv:2310.08095  [pdf, other

    cs.IT eess.SP

    Multi-Satellite Cooperative Networks: Joint Hybrid Beamforming and User Scheduling Design

    Authors: Xuan Zhang, Shu Sun, Meixia Tao, Qin Huang, Xiaohu Tang

    Abstract: In this paper, we consider a cooperative communication network where multiple low-Earth-orbit (LEO) satellites provide services to multiple ground users (GUs) cooperatively at the same time and on the same frequency. The multi-satellite cooperation has great potential in extending communication coverage and increasing spectral efficiency. Considering that the on-board radio-frequency circuit resou… ▽ More

    Submitted 27 December, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: 14 pages, 13 figures. arXiv admin note: substantial text overlap with arXiv:2301.03888

  45. arXiv:2310.05547  [pdf, other

    cs.RO eess.SY

    Geometry-Aware Safety-Critical Local Reactive Controller for Robot Navigation in Unknown and Cluttered Environments

    Authors: Yulin Li, Xindong Tang, Kai Chen, Chunxin Zheng, Haichao Liu, Jun Ma

    Abstract: This work proposes a safety-critical local reactive controller that enables the robot to navigate in unknown and cluttered environments. In particular, the trajectory tracking task is formulated as a constrained polynomial optimization problem. Then, safety constraints are imposed on the control variables invoking the notion of polynomial positivity certificates in conjunction with their Sum-of-Sq… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

  46. arXiv:2308.15860  [pdf

    eess.IV

    Research on Image Stitching Based on Invariant Features of Reconstructed Plane

    Authors: Qi Liu, Xiyu Tang, Ju Huo

    Abstract: Generating high-quality stitched images is a challenging task in computer vision. The existing feature-based image stitching methods commonly only focus on point and line features, neglecting the crucial role of higher-level planar features in image stitching. This paper proposes an image stitching method based on invariant planar features, which uses planar features as constraints to improve the… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

  47. arXiv:2307.06742  [pdf, other

    eess.SY cs.AI cs.LG

    Vehicle Dispatching and Routing of On-Demand Intercity Ride-Pooling Services: A Multi-Agent Hierarchical Reinforcement Learning Approach

    Authors: Jinhua Si, Fang He, Xi Lin, Xindi Tang

    Abstract: The integrated development of city clusters has given rise to an increasing demand for intercity travel. Intercity ride-pooling service exhibits considerable potential in upgrading traditional intercity bus services by implementing demand-responsive enhancements. Nevertheless, its online operations suffer the inherent complexities due to the coupling of vehicle resource allocation among cities and… ▽ More

    Submitted 20 March, 2024; v1 submitted 13 July, 2023; originally announced July 2023.

  48. arXiv:2305.18733  [pdf, other

    eess.SP

    Low-Complexity Signal Detection for the Splitting Receiver Scheme

    Authors: Yanyan Wang, Qidi Li, Xiaohu Tang

    Abstract: This letter proposes a low-complexity signal detection method for the splitting receiver scheme, which achieves an excellent symbol error rate (SER) performance. Based on the three-dimensional (3D) received signal of the splitting receiver, we derive an equivalent two-dimensional (2D) signal model and develop a low-complexity signal detection method for the practical modulation scheme. The computa… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

  49. arXiv:2305.11504  [pdf, other

    eess.IV cs.CV cs.LG

    JOINEDTrans: Prior Guided Multi-task Transformer for Joint Optic Disc/Cup Segmentation and Fovea Detection

    Authors: Huaqing He, Li Lin, Zhiyuan Cai, Pujin Cheng, Xiaoying Tang

    Abstract: Deep learning-based image segmentation and detection models have largely improved the efficiency of analyzing retinal landmarks such as optic disc (OD), optic cup (OC), and fovea. However, factors including ophthalmic disease-related lesions and low image quality issues may severely complicate automatic OD/OC segmentation and fovea detection. Most existing works treat the identification of each la… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: 11 pages, 6 figures

  50. arXiv:2304.06258  [pdf, other

    cs.CV cs.LG eess.IV

    MProtoNet: A Case-Based Interpretable Model for Brain Tumor Classification with 3D Multi-parametric Magnetic Resonance Imaging

    Authors: Yuanyuan Wei, Roger Tam, Xiaoying Tang

    Abstract: Recent applications of deep convolutional neural networks in medical imaging raise concerns about their interpretability. While most explainable deep learning applications use post hoc methods (such as GradCAM) to generate feature attribution maps, there is a new type of case-based reasoning models, namely ProtoPNet and its variants, which identify prototypes during training and compare input imag… ▽ More

    Submitted 14 April, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: 15 pages, 5 figures, 1 table; accepted for oral presentation at MIDL 2023 (https://openreview.net/forum?id=6Wbj3QCo4U4 ); camera-ready version