Skip to main content

Showing 1–50 of 98 results for author: Xiao, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.08967  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model

    Authors: Ailin Huang, Bingxin Li, Bruce Wang, Boyong Wu, Chao Yan, Chengli Feng, Heng Wang, Hongyu Zhou, Hongyuan Wang, Jingbei Li, Jianjian Sun, Joanna Wang, Mingrui Chen, Peng Liu, Ruihang Miao, Shilei Jiang, Tian Fei, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Ge, Zheng Gong, Zhewei Huang , et al. (51 additional authors not shown)

    Abstract: Large Audio-Language Models (LALMs) have significantly advanced intelligent human-computer interaction, yet their reliance on text-based outputs limits their ability to generate natural speech responses directly, hindering seamless audio interactions. To address this, we introduce Step-Audio-AQAA, a fully end-to-end LALM designed for Audio Query-Audio Answer (AQAA) tasks. The model integrates a du… ▽ More

    Submitted 13 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

    Comments: 12 pages, 3 figures

  2. arXiv:2505.19476  [pdf, ps, other

    eess.AS eess.SP

    FlowSE: Efficient and High-Quality Speech Enhancement via Flow Matching

    Authors: Ziqian Wang, Zikai Liu, Xinfa Zhu, Yike Zhu, Mingshuai Liu, Jun Chen, Longshuai Xiao, Chao Weng, Lei Xie

    Abstract: Generative models have excelled in audio tasks using approaches such as language models, diffusion, and flow matching. However, existing generative approaches for speech enhancement (SE) face notable challenges: language model-based methods suffer from quantization loss, leading to compromised speaker similarity and intelligibility, while diffusion models require complex training and high inferenc… ▽ More

    Submitted 27 May, 2025; v1 submitted 25 May, 2025; originally announced May 2025.

    Comments: Accepted to InterSpeech 2025

  3. arXiv:2505.15536  [pdf, ps, other

    eess.SY cs.DC

    DeepCEE: Efficient Cross-Region Model Distributed Training System under Heterogeneous GPUs and Networks

    Authors: Jinquan Wang, Xiaojian Liao, Xuzhao Liu, Jiashun Suo, Zhisheng Huo, Chenhao Zhang, Xiangrong Xu, Runnan Shen, Xilong Xie, Limin Xiao

    Abstract: Most existing training systems focus on a single region. In contrast, we envision that cross-region training offers more flexible GPU resource allocation and yields significant potential. However, the hierarchical cluster topology and unstable networks in the cloud-edge-end (CEE) environment, a typical cross-region scenario, pose substantial challenges to building an efficient and autonomous model… ▽ More

    Submitted 27 May, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

  4. arXiv:2504.15260  [pdf, other

    eess.SP

    Joint Knowledge and Power Management for Secure Semantic Communication Networks

    Authors: Xuesong Liu, Yansong Liu, Haoyu Tang, Fangzhou Zhao, Le Xia, Yao Sun

    Abstract: Recently, semantic communication (SemCom) has shown its great superiorities in resource savings and information exchanges. However, while its unique background knowledge guarantees accurate semantic reasoning and recovery, semantic information security-related concerns are introduced at the same time. Since the potential eavesdroppers may have the same background knowledge to accurately decrypt th… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  5. arXiv:2504.04928  [pdf, other

    eess.SP

    Advanced Codebook Design for SCMA-aided NTNs With Randomly Distributed Users

    Authors: Tianyang Hu, Qu Luo, Lixia Xiao, Jiaxi Zhou, Pei Xiao, Tao Jiang

    Abstract: In this letter, a novel class of sparse codebooks is proposed for sparse code multiple access (SCMA) aided non-terrestrial networks (NTN) with randomly distributed users characterized by Rician fading channels. Specifically, we first exploit the upper bound of bit error probability (BEP) of an SCMA-aided NTN with large-scale fading of different users under Rician fading channels. Then, the codeboo… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  6. arXiv:2504.03289  [pdf, other

    cs.SD cs.CL eess.AS

    RWKVTTS: Yet another TTS based on RWKV-7

    Authors: Lin yueyu, Liu Xiao

    Abstract: Human-AI interaction thrives on intuitive and efficient interfaces, among which voice stands out as a particularly natural and accessible modality. Recent advancements in transformer-based text-to-speech (TTS) systems, such as Fish-Speech, CosyVoice, and MegaTTS 3, have delivered remarkable improvements in quality and realism, driving a significant evolution in the TTS domain. In this paper, we in… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

  7. arXiv:2503.21102  [pdf, other

    eess.SP

    Amplitude-Domain Reflection Modulation for Active RIS-Assisted Wireless Communications

    Authors: Jing Zhu, Qu, Luo, Zheng Chu, Gaojie Chen, Pei Xiao, Lixia Xiao, Chaoyun Song

    Abstract: In this paper, we propose a novel active reconfigurable intelligent surface (RIS)-assisted amplitude-domain reflection modulation (ADRM) transmission scheme, termed as ARIS-ADRM. This innovative approach leverages the additional degree of freedom (DoF) provided by the amplitude domain of the active RIS to perform index modulation (IM), thereby enhancing spectral efficiency (SE) without increasing… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  8. arXiv:2503.00493  [pdf, ps, other

    eess.AS cs.AI cs.CL cs.SD

    LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement

    Authors: Boyi Kang, Xinfa Zhu, Zihan Zhang, Zhen Ye, Mingshuai Liu, Ziqian Wang, Yike Zhu, Guobin Ma, Jun Chen, Longshuai Xiao, Chao Weng, Wei Xue, Lei Xie

    Abstract: Recent advancements in language models (LMs) have demonstrated strong capabilities in semantic understanding and contextual modeling, which have flourished in generative speech enhancement (SE). However, many LM-based SE approaches primarily focus on semantic information, often neglecting the critical role of acoustic information, which leads to acoustic inconsistency after enhancement and limited… ▽ More

    Submitted 10 June, 2025; v1 submitted 1 March, 2025; originally announced March 2025.

    Comments: ACL2025 main, Codes available at https://github.com/Kevin-naticl/LLaSE-G1

  9. arXiv:2502.11946  [pdf, other

    cs.CL cs.AI cs.HC cs.SD eess.AS

    Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

    Authors: Ailin Huang, Boyong Wu, Bruce Wang, Chao Yan, Chen Hu, Chengli Feng, Fei Tian, Feiyu Shen, Jingbei Li, Mingrui Chen, Peng Liu, Ruihang Miao, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Gong, Zixin Zhang, Hongyu Zhou, Jianjian Sun, Brian Li, Chengting Feng, Changyi Wan, Hanpeng Hu , et al. (120 additional authors not shown)

    Abstract: Real-time speech interaction, serving as a fundamental interface for human-machine collaboration, holds immense potential. However, current open-source models face limitations such as high costs in voice data collection, weakness in dynamic control, and limited intelligence. To address these challenges, this paper introduces Step-Audio, the first production-ready open-source solution. Key contribu… ▽ More

    Submitted 18 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  10. arXiv:2501.18350  [pdf, ps, other

    eess.SY

    Joint Power and Spectrum Orchestration for D2D Semantic Communication Underlying Energy-Efficient Cellular Networks

    Authors: Le Xia, Yao Sun, Haijian Sun, Rose Qingyang Hu, Dusit Niyato, Muhammad Ali Imran

    Abstract: Semantic communication (SemCom) has been recently deemed a promising next-generation wireless technique to enable efficient spectrum savings and information exchanges, thus naturally introducing a novel and practical network paradigm where cellular and device-to-device (D2D) SemCom approaches coexist. Nevertheless, the involved wireless resource management becomes complicated and challenging due t… ▽ More

    Submitted 23 June, 2025; v1 submitted 30 January, 2025; originally announced January 2025.

    Comments: This paper has been submitted to IEEE Trans. on Wireless Communications for the second round of peer review after major revisions

  11. arXiv:2501.12487  [pdf

    cs.CV cs.AI eess.IV

    fabSAM: A Farmland Boundary Delineation Method Based on the Segment Anything Model

    Authors: Yufeng Xie, Hanzhi Wu, Hongxiang Tong, Lei Xiao, Wenwen Zhou, Ling Li, Thomas Cherico Wanger

    Abstract: Delineating farmland boundaries is essential for agricultural management such as crop monitoring and agricultural census. Traditional methods using remote sensing imagery have been efficient but limited in generalisation. The Segment Anything Model (SAM), known for its impressive zero shot performance, has been adapted for remote sensing tasks through prompt learning and fine tuning. Here, we prop… ▽ More

    Submitted 21 January, 2025; originally announced January 2025.

  12. arXiv:2412.08830  [pdf, other

    cs.RO eess.SY

    EMATO: Energy-Model-Aware Trajectory Optimization for Autonomous Driving

    Authors: Zhaofeng Tian, Lichen Xia, Weisong Shi

    Abstract: Autonomous driving lacks strong proof of energy efficiency with the energy-model-agnostic trajectory planning. To achieve an energy consumption model-aware trajectory planning for autonomous driving, this study proposes an online nonlinear programming method that optimizes the polynomial trajectories generated by the Frenet polynomial method while considering both traffic trajectories and road slo… ▽ More

    Submitted 11 December, 2024; originally announced December 2024.

  13. arXiv:2412.01053  [pdf, ps, other

    cs.SD eess.AS

    FreeCodec: A disentangled neural speech codec with fewer tokens

    Authors: Youqiang Zheng, Weiping Tu, Yueteng Kang, Jie Chen, Yike Zhang, Li Xiao, Yuhong Yang, Long Ma

    Abstract: Neural speech codecs have gained great attention for their outstanding reconstruction with discrete token representations. It is a crucial component in generative tasks such as speech coding and large language models (LLM). However, most works based on residual vector quantization perform worse with fewer tokens due to low coding efficiency for modeling complex coupled information. In this p… ▽ More

    Submitted 28 June, 2025; v1 submitted 1 December, 2024; originally announced December 2024.

    Comments: 5 pages, 2 figures, 3 tables.Code and Demo page:https://github.com/exercise-book-yq/FreeCodec. Accepted to Interspeech 2025

  14. arXiv:2409.19331  [pdf, other

    eess.SP

    Wireless Environment Information Sensing, Feature, Semantic, and Knowledge: Four Steps Towards 6G AI-Enabled Air Interface

    Authors: Jianhua Zhang, Yichen Cai, Li Yu, Zhen Zhang, Yuxiang Zhang, Jialin Wang, Tao Jiang, Liang Xia, Ping Zhang

    Abstract: The air interface technology plays a crucial role in optimizing the communication quality for users. To address the challenges brought by the radio channel variations to air interface design, this article proposes a framework of wireless environment information-aided 6G AI-enabled air interface (WEI-6G AI$^{2}$), which actively acquires real-time environment details to facilitate channel fading pr… ▽ More

    Submitted 28 September, 2024; originally announced September 2024.

  15. arXiv:2409.16678  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    TSBP: Improving Object Detection in Histology Images via Test-time Self-guided Bounding-box Propagation

    Authors: Tingting Yang, Liang Xiao, Yizhe Zhang

    Abstract: A global threshold (e.g., 0.5) is often applied to determine which bounding boxes should be included in the final results for an object detection task. A higher threshold reduces false positives but may result in missing a significant portion of true positives. A lower threshold can increase detection recall but may also result in more false positives. Because of this, using a preset global thresh… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

    Comments: MICCAI 2024

  16. arXiv:2408.10670  [pdf

    cs.CV eess.IV

    A Noncontact Technique for Wave Measurement Based on Thermal Stereography and Deep Learning

    Authors: Deyu Li, Longfei Xiao, Handi Wei, Yan Li, Binghua Zhang

    Abstract: The accurate measurement of the wave field and its spatiotemporal evolution is essential in many hydrodynamic experiments and engineering applications. The binocular stereo imaging technique has been widely used to measure waves. However, the optical properties of indoor water surfaces, including transparency, specular reflection, and texture absence, pose challenges for image processing and stere… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  17. arXiv:2408.07820  [pdf, other

    cs.NI cs.IT eess.SY

    Hybrid Semantic/Bit Communication Based Networking Problem Optimization

    Authors: Le Xia, Yao Sun, Dusit Niyato, Lan Zhang, Lei Zhang, Muhammad Ali Imran

    Abstract: This paper jointly investigates user association (UA), mode selection (MS), and bandwidth allocation (BA) problems in a novel and practical next-generation cellular network where two modes of semantic communication (SemCom) and conventional bit communication (BitCom) coexist, namely hybrid semantic/bit communication network (HSB-Net). Concretely, we first identify a unified performance metric of m… ▽ More

    Submitted 19 August, 2024; v1 submitted 30 July, 2024; originally announced August 2024.

    Comments: This paper has been accepted for publication and will be presented in 2024 IEEE Global Communications Conference (GlobeCom 2024). arXiv admin note: substantial text overlap with arXiv:2404.04162

  18. arXiv:2407.20530  [pdf, other

    cs.SD eess.AS

    SuperCodec: A Neural Speech Codec with Selective Back-Projection Network

    Authors: Youqiang Zheng, Weiping Tu, Li Xiao, Xinmeng Xu

    Abstract: Neural speech coding is a rapidly developing topic, where state-of-the-art approaches now exhibit superior compression performance than conventional methods. Despite significant progress, existing methods still have limitations in preserving and reconstructing fine details for optimal reconstruction, especially at low bitrates. In this study, we introduce SuperCodec, a neural speech codec that ach… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: Accepted by ICASSP 2024

  19. arXiv:2407.07397  [pdf, other

    cs.SD eess.AS

    SimuSOE: A Simulated Snoring Dataset for Obstructive Sleep Apnea-Hypopnea Syndrome Evaluation during Wakefulness

    Authors: Jie Lin, Xiuping Yang, Li Xiao, Xinhong Li, Weiyan Yi, Yuhong Yang, Weiping Tu, Xiong Chen

    Abstract: Obstructive Sleep Apnea-Hypopnea Syndrome (OSAHS) is a prevalent chronic breathing disorder caused by upper airway obstruction. Previous studies advanced OSAHS evaluation through machine learning-based systems trained on sleep snoring or speech signal datasets. However, constructing datasets for training a precise and rapid OSAHS evaluation system poses a challenge, since 1) it is time-consuming t… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  20. arXiv:2406.18547  [pdf

    eess.IV cs.CV

    Enhancing Medical Imaging with GANs Synthesizing Realistic Images from Limited Data

    Authors: Yinqiu Feng, Bo Zhang, Lingxi Xiao, Yutian Yang, Tana Gegen, Zexi Chen

    Abstract: In this research, we introduce an innovative method for synthesizing medical images using generative adversarial networks (GANs). Our proposed GANs method demonstrates the capability to produce realistic synthetic images even when trained on a limited quantity of real medical image data, showcasing commendable generalization prowess. To achieve this, we devised a generator and discriminator networ… ▽ More

    Submitted 22 May, 2024; originally announced June 2024.

  21. arXiv:2406.16981  [pdf

    eess.IV cs.AI cs.LG eess.SP

    Research on Feature Extraction Data Processing System For MRI of Brain Diseases Based on Computer Deep Learning

    Authors: Lingxi Xiao, Jinxin Hu, Yutian Yang, Yinqiu Feng, Zichao Li, Zexi Chen

    Abstract: Most of the existing wavelet image processing techniques are carried out in the form of single-scale reconstruction and multiple iterations. However, processing high-quality fMRI data presents problems such as mixed noise and excessive computation time. This project proposes the use of matrix operations by combining mixed noise elimination methods with wavelet analysis to replace traditional itera… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  22. arXiv:2406.14485   

    cs.AI cs.HC cs.MM cs.SD eess.AS

    Proceedings of The second international workshop on eXplainable AI for the Arts (XAIxArts)

    Authors: Nick Bryan-Kinns, Corey Ford, Shuoyang Zheng, Helen Kennedy, Alan Chamberlain, Makayla Lewis, Drew Hemment, Zijin Li, Qiong Wu, Lanxi Xiao, Gus Xia, Jeba Rezwana, Michael Clemens, Gabriel Vigliensoni

    Abstract: This second international workshop on explainable AI for the Arts (XAIxArts) brought together a community of researchers in HCI, Interaction Design, AI, explainable AI (XAI), and digital arts to explore the role of XAI for the Arts. Workshop held at the 16th ACM Conference on Creativity and Cognition (C&C 2024), Chicago, USA.

    Submitted 21 October, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: Proceedings of The second international workshop on eXplainable AI for the Arts (XAIxArts)

    Report number: Report-no: XAIxArts/2024/0

  23. arXiv:2406.12456  [pdf, other

    eess.IV cs.CV

    Deep-learning-based groupwise registration for motion correction of cardiac $T_1$ mapping

    Authors: Yi Zhang, Yidong Zhao, Lu Huang, Liming Xia, Qian Tao

    Abstract: Quantitative $T_1$ mapping by MRI is an increasingly important tool for clinical assessment of cardiovascular diseases. The cardiac $T_1$ map is derived by fitting a known signal model to a series of baseline images, while the quality of this map can be deteriorated by involuntary respiratory and cardiac motion. To correct motion, a template image is often needed to register all baseline images, b… ▽ More

    Submitted 21 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024. Contents may slightly differ from the camera-ready version

  24. arXiv:2406.00690  [pdf, other

    eess.SP

    Electromagnetic Wave Property Inspired Radio Environment Knowledge Construction and AI-based Verification for 6G Digital Twin Channel

    Authors: Jialin Wang, Jianhua Zhang, Yutong Sun, Yuxiang Zhang, Tao Jiang, Liang Xia

    Abstract: As the underlying foundation of a digital twin network (DTN), a digital twin channel (DTC) can accurately depict the process of radio propagation in the air interface to support the DTN-based 6G wireless network. Since radio propagation is affected by the environment, constructing the relationship between the environment and radio wave propagation is the key to improving the accuracy of DTC, and t… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  25. arXiv:2405.03119  [pdf, ps, other

    cs.IT eess.SP

    DAFT-Spread Affine Frequency Division Multiple Access for Downlink Transmission

    Authors: Yiwei Tao, Miaowen Wen, Yao Ge, Tianqi Mao, Lixia Xiao, Jun Li

    Abstract: Affine frequency division multiplexing (AFDM) and orthogonal AFDM access (O-AFDMA) are promising techniques based on chirp signals, which are able to suppress the performance deterioration caused by Doppler shifts in high-mobility scenarios. However, the high peak-to-average power ratio (PAPR) in AFDM or O-AFDMA is still a crucial problem, which severely limits their practical applications. In thi… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  26. arXiv:2404.10232  [pdf, other

    eess.SP

    Channel Estimation for AFDM With Superimposed Pilots

    Authors: Kai Zheng, Miaowen Wen, Tianqi Mao, Lixia Xiao, Zhaocheng Wang

    Abstract: The recent proposed affine frequency division multiplexing (AFDM) employing a multi-chirp waveform has shown its reliability and robustness in doubly selective fading channels. In the existing embedded pilot-aided channel estimation methods, the presence of guard symbols in the discrete affine Fourier transform (DAFT) domain causes inevitable degradation of the spectral efficiency (SE). To improve… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  27. Wireless Resource Optimization in Hybrid Semantic/Bit Communication Networks

    Authors: Le Xia, Yao Sun, Dusit Niyato, Lan Zhang, Muhammad Ali Imran

    Abstract: Recently, semantic communication (SemCom) has shown great potential in significant resource savings and efficient information exchanges, thus naturally introducing a novel and practical cellular network paradigm where two modes of SemCom and conventional bit communication (BitCom) coexist. Nevertheless, the involved wireless resource management becomes rather complicated and challenging, given the… ▽ More

    Submitted 21 October, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted for publication by the IEEE Transactions on Communications

  28. arXiv:2403.09355  [pdf, other

    eess.IV cs.CV

    Mitigating Data Consistency Induced Discrepancy in Cascaded Diffusion Models for Sparse-view CT Reconstruction

    Authors: Hanyu Chen, Zhixiu Hao, Lin Guo, Liying Xiao

    Abstract: Sparse-view Computed Tomography (CT) image reconstruction is a promising approach to reduce radiation exposure, but it inevitably leads to image degradation. Although diffusion model-based approaches are computationally expensive and suffer from the training-sampling discrepancy, they provide a potential solution to the problem. This study introduces a novel Cascaded Diffusion with Discrepancy Mit… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  29. arXiv:2403.07951  [pdf, other

    eess.IV cs.CV cs.LG

    SAMDA: Leveraging SAM on Few-Shot Domain Adaptation for Electronic Microscopy Segmentation

    Authors: Yiran Wang, Li Xiao

    Abstract: It has been shown that traditional deep learning methods for electronic microscopy segmentation usually suffer from low transferability when samples and annotations are limited, while large-scale vision foundation models are more robust when transferring between different domains but facing sub-optimal improvement under fine-tuning. In this work, we present a new few-shot domain adaptation framewo… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  30. arXiv:2401.12468  [pdf, ps, other

    eess.SY

    Minimum observability of probabilistic Boolean networks

    Authors: Jiayi Xu, Shihua Fu, Liyuan Xia, Jianjun Wang

    Abstract: This paper studies the minimum observability of probabilistic Boolean networks (PBNs), the main objective of which is to add the fewest measurements to make an unobservable PBN become observable. First of all, the algebraic form of a PBN is established with the help of semi-tensor product (STP) of matrices. By combining the algebraic forms of two identical PBNs into a parallel system, a method to… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  31. arXiv:2312.01586  [pdf, ps, other

    math.OC eess.SY

    On the Maximization of Long-Run Reward CVaR for Markov Decision Processes

    Authors: Li Xia, Zhihui Yu, Peter W. Glynn

    Abstract: This paper studies the optimization of Markov decision processes (MDPs) from a risk-seeking perspective, where the risk is measured by conditional value-at-risk (CVaR). The objective is to find a policy that maximizes the long-run CVaR of instantaneous rewards over an infinite horizon across all history-dependent randomized policies. By establishing two optimality inequalities of opposing directio… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

    Comments: Risk-seeking optimization of CVaR in MDP

  32. arXiv:2312.01125  [pdf, other

    cs.IT eess.SP

    Design and Performance Analysis of Index Modulation Empowered AFDM System

    Authors: Jing Zhu, Qu Luo, Gaojie Chen, Pei Xiao, Lixia Xiao

    Abstract: In this letter, we incorporate index modulation (IM) into affine frequency division multiplexing (AFDM), called AFDM-IM, to enhance the bit error rate (BER) and energy efficiency (EE) performance. In this scheme, the information bits are conveyed not only by $M$-ary constellation symbols, but also by the activation of the chirp subcarriers (SCs) indices, which are determined based on the incoming… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

  33. arXiv:2311.15339  [pdf, other

    cs.CV cs.CR cs.LG eess.IV

    Adversarial Purification of Information Masking

    Authors: Sitong Liu, Zhichao Lian, Shuangquan Zhang, Liang Xiao

    Abstract: Adversarial attacks meticulously generate minuscule, imperceptible perturbations to images to deceive neural networks. Counteracting these, adversarial purification methods seek to transform adversarial input samples into clean output images to defend against adversarial attacks. Nonetheless, extent generative models fail to effectively eliminate adversarial perturbations, yielding less-than-ideal… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

  34. arXiv:2311.11804  [pdf, ps, other

    eess.SP cs.IT

    Robust Multidimentional Chinese Remainder Theorem for Integer Vector Reconstruction

    Authors: Li Xiao, Haiye Huo, Xiang-Gen Xia

    Abstract: The problem of robustly reconstructing an integer vector from its erroneous remainders appears in many applications in the field of multidimensional (MD) signal processing. To address this problem, a robust MD Chinese remainder theorem (CRT) was recently proposed for a special class of moduli, where the remaining integer matrices left-divided by a greatest common left divisor (gcld) of all the mod… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: 12 pages, 5 figure

  35. A Spectral Diffusion Prior for Hyperspectral Image Super-Resolution

    Authors: Jianjun Liu, Zebin Wu, Liang Xiao

    Abstract: Fusion-based hyperspectral image (HSI) super-resolution aims to produce a high-spatial-resolution HSI by fusing a low-spatial-resolution HSI and a high-spatial-resolution multispectral image. Such a HSI super-resolution process can be modeled as an inverse problem, where the prior knowledge is essential for obtaining the desired solution. Motivated by the success of diffusion models, we propose a… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Report number: 5528613

    Journal ref: IEEE Transactions on Geoscience and Remote Sensing, 2024

  36. arXiv:2310.16869  [pdf

    eess.IV physics.optics

    Single-pixel imaging based on deep learning

    Authors: Kai Song, Yaoxing Bian, Ku Wu, Hongrui Liu, Shuangping Han, Jiaming Li, Jiazhao Tian, Chengbin Qin, Jianyong Hu, Liantuan Xiao

    Abstract: Single-pixel imaging can collect images at the wavelengths outside the reach of conventional focal plane array detectors. However, the limited image quality and lengthy computational times for iterative reconstruction still impede the practical application of single-pixel imaging. Recently, deep learning has been introduced into single-pixel imaging, which has attracted a lot of attention due to i… ▽ More

    Submitted 16 November, 2023; v1 submitted 25 October, 2023; originally announced October 2023.

  37. Distributional Soft Actor-Critic with Three Refinements

    Authors: Jingliang Duan, Wenxuan Wang, Liming Xiao, Jiaxin Gao, Shengbo Eben Li, Chang Liu, Ya-Qin Zhang, Bo Cheng, Keqiang Li

    Abstract: Reinforcement learning (RL) has shown remarkable success in solving complex decision-making and control tasks. However, many model-free RL algorithms experience performance degradation due to inaccurate value estimation, particularly the overestimation of Q-values, which can lead to suboptimal policies. To address this issue, we previously proposed the Distributional Soft Actor-Critic (DSAC or DSA… ▽ More

    Submitted 1 February, 2025; v1 submitted 9 October, 2023; originally announced October 2023.

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

  38. arXiv:2309.12461  [pdf, other

    eess.SY cs.NI

    Knowledge Base Aware Semantic Communication in Vehicular Networks

    Authors: Le Xia, Yao Sun, Dusit Niyato, Kairong Ma, Jiawen Kang, Muhammad Ali Imran

    Abstract: Semantic communication (SemCom) has recently been considered a promising solution for the inevitable crisis of scarce communication resources. This trend stimulates us to explore the potential of applying SemCom to vehicular networks, which normally consume a tremendous amount of resources to achieve stringent requirements on high reliability and low latency. Unfortunately, the unique background k… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: This paper has been accepted for publication by 2023 IEEE International Conference on Communications (ICC 2023). arXiv admin note: substantial text overlap with arXiv:2302.11993

  39. arXiv:2309.06981  [pdf, other

    cs.CR cs.AI cs.LG cs.SD eess.AS

    MASTERKEY: Practical Backdoor Attack Against Speaker Verification Systems

    Authors: Hanqing Guo, Xun Chen, Junfeng Guo, Li Xiao, Qiben Yan

    Abstract: Speaker Verification (SV) is widely deployed in mobile systems to authenticate legitimate users by using their voice traits. In this work, we propose a backdoor attack MASTERKEY, to compromise the SV models. Different from previous attacks, we focus on a real-world practical setting where the attacker possesses no knowledge of the intended victim. To design MASTERKEY, we investigate the limitation… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: Accepted by Mobicom 2023

  40. arXiv:2308.15483  [pdf, other

    cs.NI eess.IV eess.SP

    Generative AI for Semantic Communication: Architecture, Challenges, and Outlook

    Authors: Le Xia, Yao Sun, Chengsi Liang, Lei Zhang, Muhammad Ali Imran, Dusit Niyato

    Abstract: Semantic communication (SemCom) is expected to be a core paradigm in future communication networks, yielding significant benefits in terms of spectrum resource saving and information interaction efficiency. However, the existing SemCom structure is limited by the lack of context-reasoning ability and background knowledge provisioning, which, therefore, motivates us to seek the potential of incorpo… ▽ More

    Submitted 27 October, 2024; v1 submitted 3 August, 2023; originally announced August 2023.

    Comments: This magazine article has been accepted for publication by IEEE Wireless Communications

  41. arXiv:2307.13346  [pdf, other

    cs.SD cs.MM eess.AS

    A Snoring Sound Dataset for Body Position Recognition: Collection, Annotation, and Analysis

    Authors: Li Xiao, Xiuping Yang, Xinhong Li, Weiping Tu, Xiong Chen, Weiyan Yi, Jie Lin, Yuhong Yang, Yanzhen Ren

    Abstract: Obstructive Sleep Apnea-Hypopnea Syndrome (OSAHS) is a chronic breathing disorder caused by a blockage in the upper airways. Snoring is a prominent symptom of OSAHS, and previous studies have attempted to identify the obstruction site of the upper airways by snoring sounds. Despite some progress, the classification of the obstruction site remains challenging in real-world clinical settings due to… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

    Comments: Accepted to INTERSPEECH 2023

  42. arXiv:2307.13295  [pdf, other

    cs.SD eess.AS

    CQNV: A combination of coarsely quantized bitstream and neural vocoder for low rate speech coding

    Authors: Youqiang Zheng, Li Xiao, Weiping Tu, Yuhong Yang, Xinmeng Xu

    Abstract: Recently, speech codecs based on neural networks have proven to perform better than traditional methods. However, redundancy in traditional parameter quantization is visible within the codec architecture of combining the traditional codec with the neural vocoder. In this paper, we propose a novel framework named CQNV, which combines the coarsely quantized parameters of a traditional parametric cod… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

    Comments: Accepted by INTERSPEECH 2023

  43. arXiv:2307.09729  [pdf, other

    cs.CV cs.MM eess.IV

    NTIRE 2023 Quality Assessment of Video Enhancement Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Wei Sun, Yulun Zhang, Kai Zhang, Radu Timofte, Guangtao Zhai, Yixuan Gao, Yuqin Cao, Tengchuan Kou, Yunlong Dong, Ziheng Jia, Yilin Li, Wei Wu, Shuming Hu, Sibin Deng, Pengxiang Xiao, Ying Chen, Kai Li, Kai Zhao, Kun Yuan, Ming Sun, Heng Cong, Hao Wang, Lingzhi Fu , et al. (47 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2023 Quality Assessment of Video Enhancement Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2023. This challenge is to address a major challenge in the field of video processing, namely, video quality assessment (VQA) for enhanced videos. The challenge uses the VQA Dataset for Perceptual… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

  44. arXiv:2305.16616  [pdf, other

    eess.SP

    Channel Measurement, Modeling, and Simulation for 6G: A Survey and Tutorial

    Authors: Jianhua Zhang, Jiaxin Lin, Pan Tang, Yuxiang Zhang, Huixin Xu, Tianyang Gao, Haiyang Miao, Zeyong Chai, Zhengfu Zhou, Yi Li, Huiwen Gong, Yameng Liu, Zhiqiang Yuan, Lei Tian, Shaoshi Yang, Liang Xia, Guangyi Liu, Ping Zhang

    Abstract: The sixth generation (6G) mobile communications have attracted substantial attention in the global research community of information and communication technologies (ICT). 6G systems are expected to support not only extended 5G usage scenarios, but also new usage scenarios, such as integrated sensing and communication (ISAC), integrated artificial intelligence (AI) and communication, and communicat… ▽ More

    Submitted 10 March, 2025; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: 41 pages,52 figures

  45. arXiv:2304.12704  [pdf, other

    cs.SD cs.MM eess.AS

    GTN-Bailando: Genre Consistent Long-Term 3D Dance Generation based on Pre-trained Genre Token Network

    Authors: Haolin Zhuang, Shun Lei, Long Xiao, Weiqin Li, Liyang Chen, Sicheng Yang, Zhiyong Wu, Shiyin Kang, Helen Meng

    Abstract: Music-driven 3D dance generation has become an intensive research topic in recent years with great potential for real-world applications. Most existing methods lack the consideration of genre, which results in genre inconsistency in the generated dance movements. In addition, the correlation between the dance genre and the music has not been investigated. To address these issues, we propose a genr… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Comments: Accepted by ICASSP2023.Demo page: https://im1eon.github.io/ICASSP23-GTNB-DG/

  46. xURLLC-Aware Service Provisioning in Vehicular Networks: A Semantic Communication Perspective

    Authors: Le Xia, Yao Sun, Dusit Niyato, Daquan Feng, Lei Feng, Muhammad Ali Imran

    Abstract: Semantic communication (SemCom), as an emerging paradigm focusing on meaning delivery, has recently been considered a promising solution for the inevitable crisis of scarce communication resources. This trend stimulates us to explore the potential of applying SemCom to wireless vehicular networks, which normally consume a tremendous amount of resources to meet stringent reliability and latency req… ▽ More

    Submitted 23 September, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

    Comments: This paper has been accepted for publication by IEEE Transactions on Wireless Communications

  47. Joint User Association and Bandwidth Allocation in Semantic Communication Networks

    Authors: Le Xia, Yao Sun, Dusit Niyato, Xiaoqian Li, Muhammad Ali Imran

    Abstract: Semantic communication (SemCom) has recently been considered a promising solution to guarantee high resource utilization and transmission reliability for future wireless networks. Nevertheless, the unique demand for background knowledge matching makes it challenging to achieve efficient wireless resource management for multiple users in SemCom-enabled networks (SC-Nets). To this end, this paper in… ▽ More

    Submitted 23 September, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

    Comments: This paper has been accepted for publication by IEEE Transactions on Vehicular Technology

  48. arXiv:2212.12134  [pdf, other

    eess.SP

    AMDET: Attention based Multiple Dimensions EEG Transformer for Emotion Recognition

    Authors: Yongling Xu, Yang Du, Jing Zou, Tianying Zhou, Lushan Xiao, Li Liu, Pengcheng

    Abstract: Affective computing is an important branch of artificial intelligence, and with the rapid development of brain computer interface technology, emotion recognition based on EEG signals has received broad attention. It is still a great challenge to effectively explore the multi-dimensional information in the EEG data in spite of a large number of deep learning methods. In this paper, we propose a dee… ▽ More

    Submitted 22 December, 2022; originally announced December 2022.

  49. arXiv:2212.00687  [pdf

    eess.IV

    3D-EPI Blip-Up/Down Acquisition (BUDA) with CAIPI and Joint Hankel Structured Low-Rank Reconstruction for Rapid Distortion-Free High-Resolution T2* Mapping

    Authors: Zhifeng Chen, Congyu Liao, Xiaozhi Cao, Benedikt A. Poser, Zhongbiao Xu, Wei-Ching Lo, Manyi Wen, Jaejin Cho, Qiyuan Tian, Yaohui Wang, Yanqiu Feng, Ling Xia, Wufan Chen, Feng Liu, Berkin Bilgic

    Abstract: Purpose: This work aims to develop a novel distortion-free 3D-EPI acquisition and image reconstruction technique for fast and robust, high-resolution, whole-brain imaging as well as quantitative T2* mapping. Methods: 3D-Blip-Up and -Down Acquisition (3D-BUDA) sequence is designed for both single- and multi-echo 3D GRE-EPI imaging using multiple shots with blip-up and -down readouts to encode B0 fi… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

  50. arXiv:2211.13229  [pdf, other

    eess.IV cs.CL cs.CV cs.LG

    DeltaNet:Conditional Medical Report Generation for COVID-19 Diagnosis

    Authors: Xian Wu, Shuxin Yang, Zhaopeng Qiu, Shen Ge, Yangtian Yan, Xingwang Wu, Yefeng Zheng, S. Kevin Zhou, Li Xiao

    Abstract: Fast screening and diagnosis are critical in COVID-19 patient treatment. In addition to the gold standard RT-PCR, radiological imaging like X-ray and CT also works as an important means in patient screening and follow-up. However, due to the excessive number of patients, writing reports becomes a heavy burden for radiologists. To reduce the workload of radiologists, we propose DeltaNet to generate… ▽ More

    Submitted 12 November, 2022; originally announced November 2022.