Skip to main content

Showing 1–11 of 11 results for author: Zhan, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.09511  [pdf, ps, other

    cs.RO cs.MA eess.SY

    Design of a Formation Control System to Assist Human Operators in Flying a Swarm of Robotic Blimps

    Authors: Tianfu Wu, Jiaqi Fu, Wugang Meng, Sungjin Cho, Huanzhe Zhan, Fumin Zhang

    Abstract: Formation control is essential for swarm robotics, enabling coordinated behavior in complex environments. In this paper, we introduce a novel formation control system for an indoor blimp swarm using a specialized leader-follower approach enhanced with a dynamic leader-switching mechanism. This strategy allows any blimp to take on the leader role, distributing maneuvering demands across the swarm a… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  2. arXiv:2409.00750  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer

    Authors: Yuancheng Wang, Haoyue Zhan, Liwei Liu, Ruihong Zeng, Haotian Guo, Jiachen Zheng, Qiang Zhang, Xueyao Zhang, Shunsi Zhang, Zhizheng Wu

    Abstract: The recent large-scale text-to-speech (TTS) systems are usually grouped as autoregressive and non-autoregressive systems. The autoregressive systems implicitly model duration but exhibit certain deficiencies in robustness and lack of duration controllability. Non-autoregressive systems require explicit alignment information between text and speech during training and predict durations for linguist… ▽ More

    Submitted 20 October, 2024; v1 submitted 1 September, 2024; originally announced September 2024.

  3. arXiv:2404.10365  [pdf, other

    cs.NI cs.LG eess.SP

    Learning Wireless Data Knowledge Graph for Green Intelligent Communications: Methodology and Experiments

    Authors: Yongming Huang, Xiaohu You, Hang Zhan, Shiwen He, Ningning Fu, Wei Xu

    Abstract: Intelligent communications have played a pivotal role in shaping the evolution of 6G networks. Native artificial intelligence (AI) within green communication systems must meet stringent real-time requirements. To achieve this, deploying lightweight and resource-efficient AI models is necessary. However, as wireless networks generate a multitude of data fields and indicators during operation, only… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 12 pages,11 figures

  4. arXiv:2401.10747  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    Multimodal Sentiment Analysis with Missing Modality: A Knowledge-Transfer Approach

    Authors: Weide Liu, Huijing Zhan, Hao Chen, Fengmao Lv

    Abstract: Multimodal sentiment analysis aims to identify the emotions expressed by individuals through visual, language, and acoustic cues. However, most of the existing research efforts assume that all modalities are available during both training and testing, making their algorithms susceptible to the missing modality scenario. In this paper, we propose a novel knowledge-transfer network to translate betw… ▽ More

    Submitted 17 February, 2025; v1 submitted 28 December, 2023; originally announced January 2024.

    Comments: We request to withdraw our paper from the archive due to significant errors identified in the analysis and conclusions. Upon further review, we realized that these errors undermine the validity of our findings. We plan to conduct additional research to correct these issues and resubmit a revised version in the future

  5. arXiv:2308.16021  [pdf, other

    cs.SD eess.AS

    CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis

    Authors: Yi Meng, Xiang Li, Zhiyong Wu, Tingtian Li, Zixun Sun, Xinyu Xiao, Chi Sun, Hui Zhan, Helen Meng

    Abstract: To further improve the speaking styles of synthesized speeches, current text-to-speech (TTS) synthesis systems commonly employ reference speeches to stylize their outputs instead of just the input texts. These reference speeches are obtained by manual selection which is resource-consuming, or selected by semantic features. However, semantic features contain not only style-related information, but… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: Accepted by InterSpeech 2022

  6. arXiv:2202.02732  [pdf, other

    eess.IV physics.optics

    Diffractive deep neural network based adaptive optics scheme for vortex beam in oceanic turbulence

    Authors: Haichao Zhan, Le Wang, Wennai Wang, Shengmei Zhao

    Abstract: Vortex beam carrying orbital angular momentum (OAM) is disturbed by oceanic turbulence (OT) when propagating in underwater wireless optical communication (UWOC) system. Adaptive optics (AO) is used to compensate for distortion and improve the performance of the UWOC system. In this work, we propose a diffractive deep neural network (DDNN) based AO scheme to compensate for the distortion caused by… ▽ More

    Submitted 6 February, 2022; originally announced February 2022.

  7. arXiv:2110.07192  [pdf, other

    eess.AS cs.SD

    Exploring Timbre Disentanglement in Non-Autoregressive Cross-Lingual Text-to-Speech

    Authors: Haoyue Zhan, Xinyuan Yu, Haitong Zhang, Yang Zhang, Yue Lin

    Abstract: In this paper, we study the disentanglement of speaker and language representations in non-autoregressive cross-lingual TTS models from various aspects. We propose a phoneme length regulator that solves the length mismatch problem between IPA input sequence and monolingual alignment results. Using the phoneme length regulator, we present a FastPitch-based cross-lingual model with IPA symbols as in… ▽ More

    Submitted 30 August, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: Accepted by Interspeech 2022

  8. arXiv:2110.07187  [pdf, other

    cs.CL cs.SD eess.AS

    Revisiting IPA-based Cross-lingual Text-to-speech

    Authors: Haitong Zhang, Haoyue Zhan, Yang Zhang, Xinyuan Yu, Yue Lin

    Abstract: International Phonetic Alphabet (IPA) has been widely used in cross-lingual text-to-speech (TTS) to achieve cross-lingual voice cloning (CL VC). However, IPA itself has been understudied in cross-lingual TTS. In this paper, we report some empirical findings of building a cross-lingual TTS model using IPA as inputs. Experiments show that the way to process the IPA and suprasegmental sequence has a… ▽ More

    Submitted 18 October, 2021; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: Submitted to ICASSP2022

  9. arXiv:1910.01919  [pdf, other

    eess.SY cs.LG math.OC

    Relationship Explainable Multi-objective Optimization Via Vector Value Function Based Reinforcement Learning

    Authors: Huixin Zhan, Yongcan Cao

    Abstract: Solving multi-objective optimization problems is important in various applications where users are interested in obtaining optimal policies subject to multiple, yet often conflicting objectives. A typical approach to obtain optimal policies is to first construct a loss function that is based on the scalarization of individual objectives, and then find the optimal policy that minimizes the loss. Ho… ▽ More

    Submitted 2 October, 2019; originally announced October 2019.

    Comments: COLT19 submission. arXiv admin note: substantial text overlap with arXiv:1909.12268

  10. arXiv:1909.12268  [pdf, other

    eess.SY cs.LG math.OC

    Relationship Explainable Multi-objective Reinforcement Learning with Semantic Explainability Generation

    Authors: Huixin Zhan, Yongcan Cao

    Abstract: Solving multi-objective optimization problems is important in various applications where users are interested in obtaining optimal policies subject to multiple, yet often conflicting objectives. A typical approach to obtain optimal policies is to first construct a loss function that is based on the scalarization of individual objectives, and then find the optimal policy that minimizes the loss. Ho… ▽ More

    Submitted 26 September, 2019; originally announced September 2019.

  11. arXiv:1906.07861  [pdf

    cs.CY eess.SY

    Controllable Planning, Responsibility, and Information in Automatic Driving Technology

    Authors: Dan Wan, Hao Zhan

    Abstract: People hope automated driving technology should be always in a stable and controllable state, accurately, which can be divided into controllable planning, responsibility, and information. Otherwise, it would bring about the problems of tram dilemma, responsibility attribution, information leakage, and security. This article discusses these three types of issues separately and clarifies some misund… ▽ More

    Submitted 27 June, 2019; v1 submitted 18 June, 2019; originally announced June 2019.

    Comments: The 7th International Symposium on Project Management (ISPM2019)