Skip to main content

Showing 1–20 of 20 results for author: Zhuang, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.11817  [pdf, ps, other

    eess.AS cs.LG cs.SD

    AnalyticKWS: Towards Exemplar-Free Analytic Class Incremental Learning for Small-footprint Keyword Spotting

    Authors: Yang Xiao, Tianyi Peng, Rohan Kumar Das, Yuchen Hu, Huiping Zhuang

    Abstract: Keyword spotting (KWS) offers a vital mechanism to identify spoken commands in voice-enabled systems, where user demands often shift, requiring models to learn new keywords continually over time. However, a major problem is catastrophic forgetting, where models lose their ability to recognize earlier keywords. Although several continual learning methods have proven their usefulness for reducing fo… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: Accepted by ACL 2025

  2. arXiv:2501.09352  [pdf, other

    cs.LG cs.MM eess.IV

    PAL: Prompting Analytic Learning with Missing Modality for Multi-Modal Class-Incremental Learning

    Authors: Xianghu Yue, Yiming Chen, Xueyi Zhang, Xiaoxue Gao, Mengling Feng, Mingrui Lao, Huiping Zhuang, Haizhou Li

    Abstract: Multi-modal class-incremental learning (MMCIL) seeks to leverage multi-modal data, such as audio-visual and image-text pairs, thereby enabling models to learn continuously across a sequence of tasks while mitigating forgetting. While existing studies primarily focus on the integration and utilization of multi-modal information for MMCIL, a critical challenge remains: the issue of missing modalitie… ▽ More

    Submitted 16 January, 2025; originally announced January 2025.

  3. arXiv:2411.04711  [pdf, other

    cs.CV eess.IV

    Progressive Multi-Level Alignments for Semi-Supervised Domain Adaptation SAR Target Recognition Using Simulated Data

    Authors: Xinzheng Zhang, Hui Zhu, Hongqian Zhuang

    Abstract: Recently, an intriguing research trend for automatic target recognition (ATR) from synthetic aperture radar (SAR) imagery has arisen: using simulated data to train ATR models is a feasible solution to the issue of inadequate measured data. To close the domain gap that exists between the real and simulated data, the unsupervised domain adaptation (UDA) techniques are frequently exploited to constru… ▽ More

    Submitted 7 November, 2024; originally announced November 2024.

  4. arXiv:2409.07224  [pdf, other

    cs.SD eess.AS

    Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection

    Authors: Xinyuan Qian, Xianghu Yue, Jiadong Wang, Huiping Zhuang, Haizhou Li

    Abstract: Sound Source Localization (SSL) enabling technology for applications such as surveillance and robotics. While traditional Signal Processing (SP)-based SSL methods provide analytic solutions under specific signal and noise assumptions, recent Deep Learning (DL)-based methods have significantly outperformed them. However, their success depends on extensive training data and substantial computational… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

  5. arXiv:2408.11582  [pdf, other

    cs.RO eess.SY

    Enhanced Visual SLAM for Collision-free Driving with Lightweight Autonomous Cars

    Authors: Zhihao Lin, Zhen Tian, Qi Zhang, Hanyang Zhuang, Jianglin Lan

    Abstract: The paper presents a vision-based obstacle avoidance strategy for lightweight self-driving cars that can be run on a CPU-only device using a single RGB-D camera. The method consists of two steps: visual perception and path planning. The visual perception part uses ORBSLAM3 enhanced with optical flow to estimate the car's poses and extract rich texture information from the scene. In the path planni… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 16 pages; Submitted to a journal

  6. arXiv:2408.08242  [pdf, ps, other

    cs.RO cs.AI cs.LG eess.SY

    A Conflicts-free, Speed-lossless KAN-based Reinforcement Learning Decision System for Interactive Driving in Roundabouts

    Authors: Zhihao Lin, Zhen Tian, Qi Zhang, Ziyang Ye, Hanyang Zhuang, Jianglin Lan

    Abstract: Safety and efficiency are crucial for autonomous driving in roundabouts, especially in the context of mixed traffic where autonomous vehicles (AVs) and human-driven vehicles coexist. This paper introduces a learning-based algorithm tailored to foster safe and efficient driving behaviors across varying levels of traffic flows in roundabouts. The proposed algorithm employs a deep Q-learning network… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 15 pages, 12 figures, submitted to an IEEE journal

  7. arXiv:2403.05834  [pdf, other

    cs.MM cs.SD eess.AS

    Enhancing Expressiveness in Dance Generation via Integrating Frequency and Music Style Information

    Authors: Qiaochu Huang, Xu He, Boshi Tang, Haolin Zhuang, Liyang Chen, Shuochen Gao, Zhiyong Wu, Haozhi Huang, Helen Meng

    Abstract: Dance generation, as a branch of human motion generation, has attracted increasing attention. Recently, a few works attempt to enhance dance expressiveness, which includes genre matching, beat alignment, and dance dynamics, from certain aspects. However, the enhancement is quite limited as they lack comprehensive consideration of the aforementioned three factors. In this paper, we propose Expressi… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  8. arXiv:2311.04769  [pdf

    eess.IV cs.CV

    An attention-based deep learning network for predicting Platinum resistance in ovarian cancer

    Authors: Haoming Zhuang, Beibei Li, Jingtong Ma, Patrice Monkam, Shouliang Qi, Wei Qian, Dianning He

    Abstract: Background: Ovarian cancer is among the three most frequent gynecologic cancers globally. High-grade serous ovarian cancer (HGSOC) is the most common and aggressive histological type. Guided treatment for HGSOC typically involves platinum-based combination chemotherapy, necessitating an assessment of whether the patient is platinum-resistant. The purpose of this study is to propose a deep learning… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

  9. arXiv:2305.11094  [pdf, other

    cs.HC cs.CV cs.MM cs.SD eess.AS

    QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation

    Authors: Sicheng Yang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao, Haolin Zhuang

    Abstract: Speech-driven gesture generation is highly challenging due to the random jitters of human motion. In addition, there is an inherent asynchronous relationship between human speech and gestures. To tackle these challenges, we introduce a novel quantization-based and phase-guided motion-matching framework. Specifically, we first present a gesture VQ-VAE module to learn a codebook to summarize meaning… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: 15 pages, 12 figures, CVPR 2023 Highlight

  10. arXiv:2304.12704  [pdf, other

    cs.SD cs.MM eess.AS

    GTN-Bailando: Genre Consistent Long-Term 3D Dance Generation based on Pre-trained Genre Token Network

    Authors: Haolin Zhuang, Shun Lei, Long Xiao, Weiqin Li, Liyang Chen, Sicheng Yang, Zhiyong Wu, Shiyin Kang, Helen Meng

    Abstract: Music-driven 3D dance generation has become an intensive research topic in recent years with great potential for real-world applications. Most existing methods lack the consideration of genre, which results in genre inconsistency in the generated dance movements. In addition, the correlation between the dance genre and the music has not been investigated. To address these issues, we propose a genr… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Comments: Accepted by ICASSP2023.Demo page: https://im1eon.github.io/ICASSP23-GTNB-DG/

  11. arXiv:2304.08990  [pdf, other

    eess.IV cs.CV

    A Comparison of Image Denoising Methods

    Authors: Zhaoming Kong, Fangxi Deng, Haomin Zhuang, Jun Yu, Lifang He, Xiaowei Yang

    Abstract: The advancement of imaging devices and countless images generated everyday pose an increasingly high demand on image denoising, which still remains a challenging task in terms of both effectiveness and efficiency. To improve denoising quality, numerous denoising techniques and approaches have been proposed in the past decades, including different transforms, regularization terms, algebraic represe… ▽ More

    Submitted 9 May, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

    Comments: In this paper, we intend to collect and compare various denoising methods to investigate their effectiveness, efficiency, applicability and generalization ability with both synthetic and real-world experiments. arXiv admin note: substantial text overlap with arXiv:2011.03462

  12. AccEar: Accelerometer Acoustic Eavesdropping with Unconstrained Vocabulary

    Authors: Pengfei Hu, Hui Zhuang, Panneer Selvam Santhalingamy, Riccardo Spolaor, Parth Pathaky, Guoming Zhang, Xiuzhen Cheng

    Abstract: With the increasing popularity of voice-based applications, acoustic eavesdropping has become a serious threat to users' privacy. While on smartphones the access to microphones needs an explicit user permission, acoustic eavesdropping attacks can rely on motion sensors (such as accelerometer and gyroscope), which access is unrestricted. However, previous instances of such attacks can only recogniz… ▽ More

    Submitted 2 December, 2022; originally announced December 2022.

    Comments: 2022 IEEE Symposium on Security and Privacy (SP)

    Journal ref: 2022 IEEE Symposium on Security and Privacy (SP)

  13. arXiv:2208.08757  [pdf, other

    eess.AS cs.LG cs.SD

    Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion

    Authors: SiCheng Yang, Methawee Tantrawenith, Haolin Zhuang, Zhiyong Wu, Aolan Sun, Jianzong Wang, Ning Cheng, Huaizhen Tang, Xintao Zhao, Jie Wang, Helen Meng

    Abstract: One-shot voice conversion (VC) with only a single target speaker's speech for reference has become a hot research topic. Existing works generally disentangle timbre, while information about pitch, rhythm and content is still mixed together. To perform one-shot VC effectively with further disentangling these speech components, we employ random resampling for pitch and content encoder and use the va… ▽ More

    Submitted 18 August, 2022; originally announced August 2022.

    Comments: 5 pages,5 figures,INTERSPEECH 2022

  14. arXiv:2206.01599  [pdf, other

    eess.SP

    A Deep-Learning Usability Expansion Model of Ocean Observations

    Authors: Ali Muhamed Ali, Hanqi Zhuang, Yu Huang, Ali K. Ibrahim, Ali Salem Altaher, Laurent Chérubin

    Abstract: Today's ocean numerical prediction skills depend on the availability of in-situ and remote ocean observations at the time of the predictions only. Because observations are scarce and discontinuous in time and space, numerical models are often unable to accurately model and predict real ocean dynamics, leading to a lack of fulfillment of a range of services that require reliable predictions at vari… ▽ More

    Submitted 3 June, 2022; originally announced June 2022.

    Comments: 34 pages, 14 figurs, one table

  15. arXiv:2008.01798  [pdf, other

    cs.LG eess.IV eess.SP stat.ML

    Physics-informed Tensor-train ConvLSTM for Volumetric Velocity Forecasting of Loop Current

    Authors: Yu Huang, Yufei Tang, Hanqi Zhuang, James VanZwieten, Laurent Cherubin

    Abstract: According to the National Academies, a weekly forecast of velocity, vertical structure, and duration of the Loop Current (LC) and its eddies is critical for understanding the oceanography and ecosystem, and for mitigating outcomes of anthropogenic and natural disasters in the Gulf of Mexico (GoM). However, this forecast is a challenging problem since the LC behaviour is dominated by long-range spa… ▽ More

    Submitted 18 December, 2021; v1 submitted 4 August, 2020; originally announced August 2020.

    Comments: 10 pages, 5 figures

    Journal ref: Front. Artif. Intell. 4(2021)197

  16. arXiv:2007.09478  [pdf

    cs.CV cs.LG eess.IV

    Classification of Diabetic Retinopathy via Fundus Photography: Utilization of Deep Learning Approaches to Speed up Disease Detection

    Authors: Hangwei Zhuang, Nabil Ettehadi

    Abstract: In this paper, we propose two distinct solutions to the problem of Diabetic Retinopathy (DR) classification. In the first approach, we introduce a shallow neural network architecture. This model performs well on classification of the most frequent classes while fails at classifying the less frequent ones. In the second approach, we use transfer learning to re-train the last modified layer of a ver… ▽ More

    Submitted 18 July, 2020; originally announced July 2020.

    Comments: 6 pages, 9 figures

    ACM Class: I.4.6; I.4.9

  17. arXiv:2006.10159  [pdf, other

    physics.ins-det cs.LG eess.IV eess.SP hep-ex

    Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors

    Authors: Claudionor N. Coelho Jr., Aki Kuusela, Shan Li, Hao Zhuang, Thea Aarrestad, Vladimir Loncar, Jennifer Ngadiuba, Maurizio Pierini, Adrian Alan Pol, Sioni Summers

    Abstract: Although the quest for more accurate solutions is pushing deep learning research towards larger and more complex algorithms, edge devices demand efficient inference and therefore reduction in model size, latency and energy consumption. One technique to limit model size is quantization, which implies using fewer bits to represent weights and biases. Such an approach usually results in a decline in… ▽ More

    Submitted 21 June, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

    Journal ref: Nature Machine Intelligence, Volume 3 (2021)

  18. arXiv:2005.08356  [pdf

    eess.AS cs.SD

    North Atlantic Right Whales Up-call Detection Using Multimodel Deep Learning

    Authors: Ali K Ibrahim, Hanqi Zhuang, Laurent M. Ch'erubin, Nurgun Erdol, Gregory O Corry-Crowe, Ali Muhamed Ali

    Abstract: A new method for North Atlantic Right Whales (NARW) up-call detection using Multimodel Deep Learning (MMDL) is presented in this paper. In this approach, signals from passive acoustic sensors are first converted to spectrogram and scalogram images, which are time-frequency representations of the signals. These images are in turn used to train an MMDL detec-tor, consisting of Convolutional Neural N… ▽ More

    Submitted 17 May, 2020; originally announced May 2020.

  19. arXiv:2001.00170  [pdf

    eess.IV cs.CV

    Residual Block-based Multi-Label Classification and Localization Network with Integral Regression for Vertebrae Labeling

    Authors: Chunli Qin, Demin Yao, Han Zhuang, Hui Wang, Yonghong Shi, Zhijian Song

    Abstract: Accurate identification and localization of the vertebrae in CT scans is a critical and standard preprocessing step for clinical spinal diagnosis and treatment. Existing methods are mainly based on the integration of multiple neural networks, and most of them use the Gaussian heat map to locate the vertebrae's centroid. However, the process of obtaining the vertebrae's centroid coordinates using h… ▽ More

    Submitted 1 January, 2020; originally announced January 2020.

    Comments: 10 pages with 9 figures

  20. arXiv:0908.1273  [pdf, other

    math.OC cs.NI eess.SY

    A General Class of Throughput Optimal Routing Policies in Multi-hop Wireless Networks

    Authors: Mohammad Naghshvar, Hairuo Zhuang, Tara Javidi

    Abstract: This paper considers the problem of throughput optimal routing/scheduling in a multi-hop constrained queueing network with random connectivity whose special case includes opportunistic multi-hop wireless networks and input-queued switch fabrics. The main challenge in the design of throughput optimal routing policies is closely related to identifying appropriate and universal Lyapunov functions wit… ▽ More

    Submitted 10 March, 2011; v1 submitted 10 August, 2009; originally announced August 2009.

    Comments: 31 pages (one column), 8 figures, (revision submitted to IEEE Transactions on Information Theory)

    MSC Class: 34D20