Skip to main content

Showing 1–12 of 12 results for author: Zhuo, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.08221  [pdf, ps, other

    eess.SP

    Performance Analysis of Cooperative Integrated Sensing and Communications for 6G Networks

    Authors: Dongsheng Sui, Cunhua Pan, Hong Ren, Jiahua Wan, Liuchang Zhuo, Jing Jin, Qixing Wang, Jiangzhou Wang

    Abstract: In this work, we aim to effectively characterize the performance of cooperative integrated sensing and communication (ISAC) networks and to reveal how performance metrics relate to network parameters. To this end, we introduce a generalized stochastic geometry framework to model the cooperative ISAC networks, which approximates the spatial randomness of the network deployment. Based on this framew… ▽ More

    Submitted 13 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

  2. arXiv:2503.21254  [pdf, other

    cs.CV cs.AI cs.MM cs.SD eess.AS

    Vision-to-Music Generation: A Survey

    Authors: Zhaokai Wang, Chenxi Bao, Le Zhuo, Jingrui Han, Yang Yue, Yihong Tang, Victor Shea-Jay Huang, Yue Liao

    Abstract: Vision-to-music Generation, including video-to-music and image-to-music tasks, is a significant branch of multimodal artificial intelligence demonstrating vast application prospects in fields such as film scoring, short video creation, and dance music synthesis. However, compared to the rapid development of modalities like text and images, research in vision-to-music is still in its preliminary st… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

  3. arXiv:2502.05559  [pdf, other

    eess.SP

    Channel Estimation for RIS-Aided MU-MIMO mmWave Systems with Practical Hybrid Architecture

    Authors: Liuchang Zhuo, Cunhua Pan, Hong Ren, Ruisong Weng, Shi Jin, A. Lee Swindlehurst, Jiangzhou Wang

    Abstract: This paper proposes a correlation-based three-stage channel estimation strategy with low pilot overhead for reconfigurable intelligent surface (RIS)-aided millimeter wave (mmWave) multi-user (MU) MIMO systems, in which both users and base station (BS) are equipped with a hybrid RF architecture. In Stage I, all users jointly transmit pilots and recover the uncompressed received signals to estimate… ▽ More

    Submitted 8 February, 2025; originally announced February 2025.

    Comments: 13 pages, 7 figures, 1 table

  4. arXiv:2412.09428  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation

    Authors: Baisen Wang, Le Zhuo, Zhaokai Wang, Chenxi Bao, Wu Chengjing, Xuecheng Nie, Jiao Dai, Jizhong Han, Yue Liao, Si Liu

    Abstract: Multimodal music generation aims to produce music from diverse input modalities, including text, videos, and images. Existing methods use a common embedding space for multimodal fusion. Despite their effectiveness in other modalities, their application in multimodal music generation faces challenges of data scarcity, weak cross-modal alignment, and limited controllability. This paper addresses the… ▽ More

    Submitted 12 December, 2024; originally announced December 2024.

  5. arXiv:2308.02915  [pdf, other

    cs.GR cs.CV cs.SD eess.AS

    DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation

    Authors: Qiaosong Qi, Le Zhuo, Aixi Zhang, Yue Liao, Fei Fang, Si Liu, Shuicheng Yan

    Abstract: When hearing music, it is natural for people to dance to its rhythm. Automatic dance generation, however, is a challenging task due to the physical constraints of human motion and rhythmic alignment with target music. Conventional autoregressive methods introduce compounding errors during sampling and struggle to capture the long-term structure of dance sequences. To address these limitations, we… ▽ More

    Submitted 5 August, 2023; originally announced August 2023.

    Comments: Accepted at ACM MM 2023

  6. arXiv:2306.17103  [pdf, other

    cs.CL cs.SD eess.AS

    LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT

    Authors: Le Zhuo, Ruibin Yuan, Jiahao Pan, Yinghao Ma, Yizhi LI, Ge Zhang, Si Liu, Roger Dannenberg, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wei Xue, Yike Guo

    Abstract: We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method achieving state-of-the-art performance on various lyrics transcription datasets, even in challenging genres such as rock and metal. Our novel, training-free approach utilizes Whisper, a weakly supervised robust speech recognition model, and GPT-4, today's most performant chat-based large language mo… ▽ More

    Submitted 25 July, 2024; v1 submitted 29 June, 2023; originally announced June 2023.

    Comments: 9 pages, 2 figures, 5 tables, accepted by ISMIR 2023

  7. arXiv:2306.10548  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    MARBLE: Music Audio Representation Benchmark for Universal Evaluation

    Authors: Ruibin Yuan, Yinghao Ma, Yizhi Li, Ge Zhang, Xingran Chen, Hanzhi Yin, Le Zhuo, Yiqi Liu, Jiawen Huang, Zeyue Tian, Binyue Deng, Ningzhi Wang, Chenghua Lin, Emmanouil Benetos, Anton Ragni, Norbert Gyenge, Roger Dannenberg, Wenhu Chen, Gus Xia, Wei Xue, Si Liu, Shi Wang, Ruibo Liu, Yike Guo, Jie Fu

    Abstract: In the era of extensive intersection between art and Artificial Intelligence (AI), such as image generation and fiction co-creation, AI for music remains relatively nascent, particularly in music understanding. This is evident in the limited work on deep music representations, the scarcity of large-scale datasets, and the absence of a universal and community-driven benchmark. To address this issue… ▽ More

    Submitted 23 November, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

    Comments: camera-ready version for NeurIPS 2023

  8. arXiv:2211.11248  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Video Background Music Generation: Dataset, Method and Evaluation

    Authors: Le Zhuo, Zhaokai Wang, Baisen Wang, Yue Liao, Chenxi Bao, Stanley Peng, Songhao Han, Aixi Zhang, Fei Fang, Si Liu

    Abstract: Music is essential when editing videos, but selecting music manually is difficult and time-consuming. Thus, we seek to automatically generate background music tracks given video input. This is a challenging task since it requires music-video datasets, efficient architectures for video-to-music generation, and reasonable metrics, none of which currently exist. To close this gap, we introduce a comp… ▽ More

    Submitted 4 August, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: Accepted by ICCV2023

  9. arXiv:2207.05049  [pdf, other

    cs.CV eess.IV

    Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis

    Authors: Long Zhuo, Guangcong Wang, Shikai Li, Wayne Wu, Ziwei Liu

    Abstract: Video-to-Video synthesis (Vid2Vid) has achieved remarkable results in generating a photo-realistic video from a sequence of semantic maps. However, this pipeline suffers from high computational cost and long inference latency, which largely depends on two essential factors: 1) network architecture parameters, 2) sequential data stream. Recently, the parameters of image-based generative models have… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

    Comments: ECCV 2022, Project Page: https://fast-vid2vid.github.io/ , Code: https://github.com/fast-vid2vid/fast-vid2vid

  10. arXiv:2008.08526  [pdf, other

    eess.IV cs.CV

    Blur-Attention: A boosting mechanism for non-uniform blurred image restoration

    Authors: Xiaoguang Li, Feifan Yang, Kin Man Lam, Li Zhuo, Jiafeng Li

    Abstract: Dynamic scene deblurring is a challenging problem in computer vision. It is difficult to accurately estimate the spatially varying blur kernel by traditional methods. Data-driven-based methods usually employ kernel-free end-to-end mapping schemes, which are apt to overlook the kernel estimation. To address this issue, we propose a blur-attention module to dynamically capture the spatially varying… ▽ More

    Submitted 19 August, 2020; originally announced August 2020.

  11. arXiv:2006.15588  [pdf, other

    eess.IV cs.CV cs.LG

    A lateral semicircular canal segmentation based geometric calibration for human temporal bone CT Image

    Authors: Xiaoguang Li, Peng Fu, Hongxia Yin, ZhenChang Wang, Li Zhuo, Hui Zhang

    Abstract: Computed Tomography (CT) of the temporal bone has become an important method for diagnosing ear diseases. Due to the different posture of the subject and the settings of CT scanners, the CT image of the human temporal bone should be geometrically calibrated to ensure the symmetry of the bilateral anatomical structure. Manual calibration is a time-consuming task for radiologists and an important pr… ▽ More

    Submitted 28 June, 2020; originally announced June 2020.

  12. arXiv:1903.09294  [pdf, other

    eess.SP cs.NI

    Hybrid Precoder and Combiner for Imperfect Beam Alignment in mmWave MIMO Systems

    Authors: Chandan Pradhan, Ang Li, Li Zhuo, Yonghui Li, Branka Vucetic

    Abstract: In this letter, we aim to design a robust hybrid precoder and combiner against beam misalignment in millimeter-wave (mmWave) communication systems. We consider the inclusion of the `error statistics' into the precoder and combiner design, where the array response that incorporates the distribution of the misalignment error is first derived. An iterative algorithm is then proposed to design the rob… ▽ More

    Submitted 21 March, 2019; originally announced March 2019.

    Comments: 4 pages