Skip to main content

Showing 1–50 of 76 results for author: Zhu, B

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.18460  [pdf, ps, other

    eess.SY

    Networked pointing system: Bearing-only target localization and pointing control

    Authors: Shiyao Li, Bo Zhu, Yining Zhou, Jie Ma, Baoqing Yang, Fenghua He

    Abstract: In the paper, we formulate the target-pointing consensus problem where the headings of agents are required to point at a common target. Only a few agents in the network can measure the bearing information of the target. A two-step solution consisting of a bearing-only estimator for target localization and a control law for target pointing is constructed to address this problem. Compared to the str… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: IFAC Conference on Networked Systems, 2025

  2. arXiv:2506.10459  [pdf, ps, other

    cs.CV eess.IV

    Boosting Adversarial Transferability for Hyperspectral Image Classification Using 3D Structure-invariant Transformation and Intermediate Feature Distance

    Authors: Chun Liu, Bingqian Zhu, Tao Xu, Zheng Zheng, Zheng Li, Wei Yang, Zhigang Han, Jiayao Wang

    Abstract: Deep Neural Networks (DNNs) are vulnerable to adversarial attacks, which pose security challenges to hyperspectral image (HSI) classification technologies based on DNNs. In the domain of natural images, numerous transfer-based adversarial attack methods have been studied. However, HSIs differ from natural images due to their high-dimensional and rich spectral information. Current research on HSI a… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

  3. arXiv:2504.20454  [pdf

    eess.IV cs.CV

    LymphAtlas- A Unified Multimodal Lymphoma Imaging Repository Delivering AI-Enhanced Diagnostic Insight

    Authors: Jiajun Ding, Beiyao Zhu, Xiaosheng Liu, Lishen Zhang, Zhao Liu

    Abstract: This study integrates PET metabolic information with CT anatomical structures to establish a 3D multimodal segmentation dataset for lymphoma based on whole-body FDG PET/CT examinations, which bridges the gap of the lack of standardised multimodal segmentation datasets in the field of haematological malignancies. We retrospectively collected 483 examination datasets acquired between March 2011 and… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: 17pages,4 figures

  4. arXiv:2504.15081  [pdf, other

    eess.SY

    PID-GM: PID Control with Gain Mapping

    Authors: Bo Zhu, Wei Yu, Hugh H. T. Liu

    Abstract: Proportional-Integral-Differential (PID) control is widely used in industrial control systems. However, up to now there are at least two open problems related with PID control. One is to have a comprehensive understanding of its robustness with respect to model uncertainties and disturbances. The other is to build intuitive, explicit and mathematically provable guidelines for PID gain tuning. In t… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: 8 pages, 7 figures

  5. arXiv:2504.04721  [pdf, other

    eess.AS

    Bridging the Gap between Continuous and Informative Discrete Representations by Random Product Quantization

    Authors: Xueqing Li, Zehan Li, Boyu Zhu, Ruihao Jing, Jian Kang, Jie Li, Xiao-Lei Zhang, Xuelong Li

    Abstract: Self-supervised learning has become a core technique in speech processing, but the high dimensionality of its representations makes discretization essential for improving efficiency. However, existing discretization methods still suffer from significant information loss, resulting in a notable performance gap compared to continuous representations. To overcome these limitations, we propose two qua… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  6. arXiv:2502.11219  [pdf, other

    eess.AS cs.SD

    AudioSpa: Spatializing Sound Events with Text

    Authors: Linfeng Feng, Lei Zhao, Boyu Zhu, Xiao-Lei Zhang, Xuelong Li

    Abstract: Text-to-audio (TTA) systems have recently demonstrated strong performance in synthesizing monaural audio from text. However, the task of generating binaural spatial audio from text, which provides a more immersive auditory experience by incorporating the sense of spatiality, have not been explored yet. In this work, we introduce text-guided binaural audio generation. As an early effort, we focus o… ▽ More

    Submitted 16 February, 2025; originally announced February 2025.

  7. arXiv:2410.12358  [pdf, ps, other

    eess.SP

    Line Spectral Analysis Using the G-Filter: An Atomic Norm Minimization Approach

    Authors: Bin Zhu

    Abstract: The area of spectral analysis has a traditional dichotomy between continuous spectra (spectral densities) which correspond to purely nondeterministic processes, and line spectra (Dirac impulses) which represent sinusoids. While the former case is important in the identification of discrete-time linear stochastic systems, the latter case is essential for the analysis and modeling of time series wit… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Comments: 17 pages, 8 figures. Submitted to Automatica

  8. arXiv:2410.12349  [pdf, ps, other

    eess.SP

    When atomic norm meets the G-filter: A general framework for line spectral estimation

    Authors: Bin Zhu, Jiale Tang

    Abstract: This paper proposes a novel approach for line spectral estimation which combines Georgiou's filter bank (G-filter) with atomic norm minimization (ANM). A key ingredient is a Carathéodory--Fejér-type decomposition for the covariance matrix of the filter output. The resulting optimization problem can be characterized via semidefinite programming and contains the standard ANM for line spectral estima… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Comments: 5 pages, 3 figures. Submitted to the Satellite Workshop HiPeCASP of ICASSP 2025

  9. arXiv:2409.01199  [pdf, other

    cs.CV eess.IV

    OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model

    Authors: Liuhan Chen, Zongjian Li, Bin Lin, Bin Zhu, Qian Wang, Shenghai Yuan, Xing Zhou, Xinhua Cheng, Li Yuan

    Abstract: Variational Autoencoder (VAE), compressing videos into latent representations, is a crucial preceding component of Latent Video Diffusion Models (LVDMs). With the same reconstruction quality, the more sufficient the VAE's compression for videos is, the more efficient the LVDMs are. However, most LVDMs utilize 2D image VAE, whose compression for videos is only in the spatial dimension and often ign… ▽ More

    Submitted 9 September, 2024; v1 submitted 2 September, 2024; originally announced September 2024.

    Comments: https://github.com/PKU-YuanGroup/Open-Sora-Plan

  10. arXiv:2405.20055  [pdf, other

    eess.SY

    Hypergraph-Aided Task-Resource Matching for Maximizing Value of Task Completion in Collaborative IoT Systems

    Authors: Botao Zhu, Xianbin Wang

    Abstract: With the growing scale and intrinsic heterogeneity of Internet of Things (IoT) systems, distributed device collaboration becomes essential for effective task completion by dynamically utilizing limited communication and computing resources. However, the separated design and situation-agnostic operation of computing, communication and application layers create a fundamental challenge for rapid task… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: This paper has been published in IEEE Transactions on Mobile Computing, May 2024

  11. Improved Soft-k-Means Clustering Algorithm for Balancing Energy Consumption in Wireless Sensor Networks

    Authors: Botao Zhu, Ebrahim Bedeer, Ha H. Nguyen, Robert Barton, Jerome Henry

    Abstract: Energy load balancing is an essential issue in designing wireless sensor networks (WSNs). Clustering techniques are utilized as energy-efficient methods to balance the network energy and prolong its lifetime. In this paper, we propose an improved soft-k-means (IS-k-means) clustering algorithm to balance the energy consumption of nodes in WSNs. First, we use the idea of ``clustering by fast search… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Journal ref: Published in IEEE Internet of Things Journal, 2021

  12. arXiv:2403.06423  [pdf, other

    eess.SP cs.RO

    LiDAR Point Cloud-based Multiple Vehicle Tracking with Probabilistic Measurement-Region Association

    Authors: Guanhua Ding, Jianan Liu, Yuxuan Xia, Tao Huang, Bing Zhu, Jinping Sun

    Abstract: Multiple extended target tracking (ETT) has gained increasing attention due to the development of high-precision LiDAR and radar sensors in automotive applications. For LiDAR point cloud-based vehicle tracking, this paper presents a probabilistic measurement-region association (PMRA) ETT model, which can describe the complex measurement distribution by partitioning the target extent into different… ▽ More

    Submitted 18 May, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: 8 pages, 5 figures, accepted by the 27th International Conference on Information Fusion (FUSION 2024)

  13. arXiv:2402.17785  [pdf, other

    cs.SD cs.AI eess.AS

    ByteComposer: a Human-like Melody Composition Method based on Language Model Agent

    Authors: Xia Liang, Xingjian Du, Jiaju Lin, Pei Zou, Yuan Wan, Bilei Zhu

    Abstract: Large Language Models (LLM) have shown encouraging progress in multimodal understanding and generation tasks. However, how to design a human-aligned and interpretable melody composition system is still under-explored. To solve this problem, we propose ByteComposer, an agent framework emulating a human's creative pipeline in four separate steps : "Conception Analysis - Draft Composition - Self-Eval… ▽ More

    Submitted 6 March, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

  14. arXiv:2402.07485  [pdf, other

    cs.SD eess.AS

    MINT: Boosting Audio-Language Model via Multi-Target Pre-Training and Instruction Tuning

    Authors: Hang Zhao, Yifei Xin, Zhesong Yu, Bilei Zhu, Lu Lu, Zejun Ma

    Abstract: In the realm of audio-language pre-training (ALP), the challenge of achieving cross-modal alignment is significant. Moreover, the integration of audio inputs with diverse distributions and task variations poses challenges in developing generic audio-language models. In this study, we present MINT, a novel ALP framework boosting audio-language models through multi-target pre-training and instructio… ▽ More

    Submitted 11 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  15. arXiv:2310.18767  [pdf, other

    eess.SP

    Enhancing Epileptic Seizure Detection with EEG Feature Embeddings

    Authors: Arman Zarei, Bingzhao Zhu, Mahsa Shoaran

    Abstract: Epilepsy is one of the most prevalent brain disorders that disrupts the lives of millions worldwide. For patients with drug-resistant seizures, there exist implantable devices capable of monitoring neural activity, promptly triggering neurostimulation to regulate seizures, or alerting patients of potential episodes. Next-generation seizure detection systems heavily rely on high-accuracy machine le… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

  16. arXiv:2310.10159  [pdf, other

    cs.SD cs.CL eess.AS

    Joint Music and Language Attention Models for Zero-shot Music Tagging

    Authors: Xingjian Du, Zhesong Yu, Jiaju Lin, Bilei Zhu, Qiuqiang Kong

    Abstract: Music tagging is a task to predict the tags of music recordings. However, previous music tagging research primarily focuses on close-set music tagging tasks which can not be generalized to new tags. In this work, we propose a zero-shot music tagging system modeled by a joint music and language attention (JMLA) model to address the open-set music tagging problem. The JMLA model consists of an audio… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: \begin{keywords} Music tagging, joint music and language attention models, Music Foundation Model. \end{keywords}

  17. arXiv:2309.06036  [pdf, other

    eess.SP

    Which Framework is Suitable for Online 3D Multi-Object Tracking for Autonomous Driving with Automotive 4D Imaging Radar?

    Authors: Jianan Liu, Guanhua Ding, Yuxuan Xia, Jinping Sun, Tao Huang, Lihua Xie, Bing Zhu

    Abstract: Online 3D multi-object tracking (MOT) has recently received significant research interests due to the expanding demand of 3D perception in advanced driver assistance systems (ADAS) and autonomous driving (AD). Among the existing 3D MOT frameworks for ADAS and AD, conventional point object tracking (POT) framework using the tracking-by-detection (TBD) strategy has been well studied and accepted for… ▽ More

    Submitted 25 May, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

    Comments: 8 pages, 5 figures, accepted by IEEE 35th Intelligent Vehicles Symposium (IV 2024), oral presentation (top 5%), code is available at https://github.com/dinggh0817/4D_Radar_MOT

  18. arXiv:2306.04970  [pdf, other

    cs.RO eess.SY

    Motion Planning for Aerial Pick-and-Place based on Geometric Feasibility Constraints

    Authors: Huazi Cao, Jiahao Shen, Cunjia Liu, Bo Zhu, Shiyu Zhao

    Abstract: This paper studies the motion planning problem of the pick-and-place of an aerial manipulator that consists of a quadcopter flying base and a Delta arm. We propose a novel partially decoupled motion planning framework to solve this problem. Compared to the state-of-the-art approaches, the proposed one has two novel features. First, it does not suffer from increased computation in high-dimensional… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

  19. arXiv:2306.02231  [pdf, other

    cs.CL cs.AI cs.LG eess.SY

    Fine-Tuning Language Models with Advantage-Induced Policy Alignment

    Authors: Banghua Zhu, Hiteshi Sharma, Felipe Vieira Frujeri, Shi Dong, Chenguang Zhu, Michael I. Jordan, Jiantao Jiao

    Abstract: Reinforcement learning from human feedback (RLHF) has emerged as a reliable approach to aligning large language models (LLMs) to human preferences. Among the plethora of RLHF techniques, proximal policy optimization (PPO) is of the most widely used methods. Despite its popularity, however, PPO may suffer from mode collapse, instability, and poor sample efficiency. We show that these issues can be… ▽ More

    Submitted 2 November, 2023; v1 submitted 3 June, 2023; originally announced June 2023.

  20. arXiv:2306.02003  [pdf, other

    cs.LG cs.AI cs.PF eess.SY stat.ML

    On Optimal Caching and Model Multiplexing for Large Model Inference

    Authors: Banghua Zhu, Ying Sheng, Lianmin Zheng, Clark Barrett, Michael I. Jordan, Jiantao Jiao

    Abstract: Large Language Models (LLMs) and other large foundation models have achieved noteworthy success, but their size exacerbates existing resource consumption and latency challenges. In particular, the large-scale deployment of these models is hindered by the significant resource requirements during inference. In this paper, we study two approaches for mitigating these challenges: employing a cache to… ▽ More

    Submitted 28 August, 2023; v1 submitted 3 June, 2023; originally announced June 2023.

  21. arXiv:2306.00265  [pdf, other

    cs.LG cs.AI cs.CV eess.IV stat.ML

    Doubly Robust Self-Training

    Authors: Banghua Zhu, Mingyu Ding, Philip Jacobson, Ming Wu, Wei Zhan, Michael Jordan, Jiantao Jiao

    Abstract: Self-training is an important technique for solving semi-supervised learning problems. It leverages unlabeled data by generating pseudo-labels and combining them with a limited labeled dataset for training. The effectiveness of self-training heavily relies on the accuracy of these pseudo-labels. In this paper, we introduce doubly robust self-training, a novel semi-supervised algorithm that provabl… ▽ More

    Submitted 2 November, 2023; v1 submitted 31 May, 2023; originally announced June 2023.

  22. arXiv:2305.08247  [pdf

    eess.SY

    A Fast and Robust Camera-IMU Online Calibration Method For Localization System

    Authors: Xiaowen Tao, Pengxiang Meng, Bing Zhu, Jian Zhao

    Abstract: Autonomous driving has spurred the development of sensor fusion techniques, which combine data from multiple sensors to improve system performance. In particular, localization system based on sensor fusion , such as Visual Simultaneous Localization and Mapping (VSLAM), is an important component in environment perception, and is the basis of decision-making and motion control for intelligent vehicl… ▽ More

    Submitted 14 May, 2023; originally announced May 2023.

  23. arXiv:2305.07618  [pdf

    cs.CV cs.LG eess.IV

    Uncertainty Estimation and Out-of-Distribution Detection for Deep Learning-Based Image Reconstruction using the Local Lipschitz

    Authors: Danyal F. Bhutto, Bo Zhu, Jeremiah Z. Liu, Neha Koonjoo, Hongwei B. Li, Bruce R. Rosen, Matthew S. Rosen

    Abstract: Accurate image reconstruction is at the heart of diagnostics in medical imaging. Supervised deep learning-based approaches have been investigated for solving inverse problems including image reconstruction. However, these trained models encounter unseen data distributions that are widely shifted from training data during deployment. Therefore, it is essential to assess whether a given input falls… ▽ More

    Submitted 1 December, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

  24. arXiv:2303.11692  [pdf, other

    cs.SD cs.IR eess.AS

    ByteCover3: Accurate Cover Song Identification on Short Queries

    Authors: Xingjian Du, Zijie Wang, Xia Liang, Huidong Liang, Bilei Zhu, Zejun Ma

    Abstract: Deep learning based methods have become a paradigm for cover song identification (CSI) in recent years, where the ByteCover systems have achieved state-of-the-art results on all the mainstream datasets of CSI. However, with the burgeon of short videos, many real-world applications require matching short music excerpts to full-length music tracks in the database, which is still under-explored and w… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

    Comments: Accepeted by ICASSP 2023

  25. arXiv:2301.06784  [pdf, other

    eess.SP math.ST

    On the Statistical Consistency of a Generalized Cepstral Estimator

    Authors: Bin Zhu, Mattia Zorzi

    Abstract: We consider the problem to estimate the generalized cepstral coefficients of a stationary stochastic process or stationary multidimensional random field. It turns out that a naive version of the periodogram-based estimator for the generalized cepstral coefficients is not consistent. We propose a consistent estimator for those coefficients. Moreover, we show that the latter can be used in order to… ▽ More

    Submitted 17 January, 2023; originally announced January 2023.

    Comments: 11 pages in IEEE Transactions template, 4 figures. Submitted to IEEE Transactions on Automatic Control

  26. arXiv:2212.05301  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Leveraging Modality-specific Representations for Audio-visual Speech Recognition via Reinforcement Learning

    Authors: Chen Chen, Yuchen Hu, Qiang Zhang, Heqing Zou, Beier Zhu, Eng Siong Chng

    Abstract: Audio-visual speech recognition (AVSR) has gained remarkable success for ameliorating the noise-robustness of speech recognition. Mainstream methods focus on fusing audio and visual inputs to obtain modality-invariant representations. However, such representations are prone to over-reliance on audio modality as it is much easier to recognize than video modality in clean conditions. As a result, th… ▽ More

    Submitted 2 February, 2023; v1 submitted 10 December, 2022; originally announced December 2022.

    Comments: Accepted by AAAI2023

  27. arXiv:2212.03540  [pdf, other

    eess.SY cs.RO

    EASpace: Enhanced Action Space for Policy Transfer

    Authors: Zheng Zhang, Qingrui Zhang, Bo Zhu, Xiaohan Wang, Tianjiang Hu

    Abstract: Formulating expert policies as macro actions promises to alleviate the long-horizon issue via structured exploration and efficient credit assignment. However, traditional option-based multi-policy transfer methods suffer from inefficient exploration of macro action's length and insufficient exploitation of useful long-duration macro actions. In this paper, a novel algorithm named EASpace (Enhanced… ▽ More

    Submitted 24 July, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

    Comments: 15 Pages

  28. Control Lyapunov-Barrier Function Based Model Predictive Control for Stochastic Nonlinear Affine Systems

    Authors: Weijiang Zheng, Bing Zhu

    Abstract: A stochastic model predictive control (MPC) framework is presented in this paper for nonlinear affine systems with stability and feasibility guarantee. We first introduce the concept of stochastic control Lyapunov-barrier function (CLBF) and provide a method to construct CLBF by combining an unconstrained control Lyapunov function (CLF) and control barrier functions. The unconstrained CLF is obtai… ▽ More

    Submitted 26 June, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

    Comments: 21 pages, 6 figures

    Journal ref: International Journal of Robust and Nonlinear Control, 2024

  29. arXiv:2208.14372  [pdf, ps, other

    math.OC eess.SY

    Dead-beat model predictive control for discrete-time linear systems

    Authors: Bing Zhu

    Abstract: In this paper, model predictive control (MPC) strategies are proposed for dead-beat control of linear systems with and without state and control constraints. In unconstrained MPC, deadbeat performance can be guaranteed by setting the control horizon to the system dimension, and adding an terminal equality constraint. It is proved that the unconstrained deadbeat MPC is equivalent to linear deadbeat… ▽ More

    Submitted 30 August, 2022; originally announced August 2022.

  30. arXiv:2208.10059  [pdf, ps, other

    stat.ME eess.SY

    Sampling Gaussian Stationary Random Fields: A Stochastic Realization Approach

    Authors: Bin Zhu, Jiahao Liu, Zhengshou Lai, Tao Qian

    Abstract: Generating large-scale samples of stationary random fields is of great importance in the fields such as geomaterial modeling and uncertainty quantification. Traditional methodologies based on covariance matrix decomposition have the diffculty of being computationally expensive, which is even more serious when the dimension of the random field is large. This paper proposes an effcient stochastic re… ▽ More

    Submitted 22 August, 2022; originally announced August 2022.

    Comments: 17 pages, 9 figures

  31. arXiv:2206.10255  [pdf, other

    eess.SY cs.CV

    GNN-PMB: A Simple but Effective Online 3D Multi-Object Tracker without Bells and Whistles

    Authors: Jianan Liu, Liping Bai, Yuxuan Xia, Tao Huang, Bing Zhu, Qing-Long Han

    Abstract: Multi-object tracking (MOT) is among crucial applications in modern advanced driver assistance systems (ADAS) and autonomous driving (AD) systems. The global nearest neighbor (GNN) filter, as the earliest random vector-based Bayesian tracking framework, has been adopted in most of state-of-the-arts trackers in the automotive industry. The development of random finite set (RFS) theory facilitates a… ▽ More

    Submitted 8 February, 2023; v1 submitted 21 June, 2022; originally announced June 2022.

    Comments: accepted by IEEE Transactions on Intelligent Vehicles

  32. Brachial Plexus Nerve Trunk Segmentation Using Deep Learning: A Comparative Study with Doctors' Manual Segmentation

    Authors: Yu Wang, Binbin Zhu, Lingsi Kong, Jianlin Wang, Bin Gao, Jianhua Wang, Dingcheng Tian, Yudong Yao

    Abstract: Ultrasound-guided nerve block anesthesia (UGNB) is a high-tech visual nerve block anesthesia method that can observe the target nerve and its surrounding structures, the puncture needle's advancement, and local anesthetics spread in real-time. The key in UGNB is nerve identification. With the help of deep learning methods, the automatic identification or segmentation of nerves can be realized, ass… ▽ More

    Submitted 17 May, 2022; originally announced May 2022.

    Comments: 9 pages

    Journal ref: [J]. Ultrasound in Medicine & Biology, 2024, 50(3): 374-383

  33. NeuralTree: A 256-Channel 0.227-$μ$J/Class Versatile Neural Activity Classification and Closed-Loop Neuromodulation SoC

    Authors: Uisub Shin, Cong Ding, Bingzhao Zhu, Yashwanth Vyza, Alix Trouillet, Emilie C. M. Revol, Stéphanie P. Lacour, Mahsa Shoaran

    Abstract: Closed-loop neural interfaces with on-chip machine learning can detect and suppress disease symptoms in neurological disorders or restore lost functions in paralyzed patients. While high-density neural recording can provide rich neural activity information for accurate disease-state detection, existing systems have low channel counts and poor scalability, which could limit their therapeutic effica… ▽ More

    Submitted 8 December, 2022; v1 submitted 12 May, 2022; originally announced May 2022.

    Journal ref: IEEE Journal of Solid-State Circuits, vol. 57, no. 11, pp. 3243-3257, Nov. 2022

  34. arXiv:2205.02999  [pdf, ps, other

    eess.SP

    Fast and Arbitrary Beam Pattern Design for RIS-Assisted Terahertz Wireless Communication

    Authors: Jian Dang, Zaichen Zhang, Yewei Li, Liang Wu, Bingcheng Zhu, Lei Wang

    Abstract: Reconfigurable intelligent surface (RIS) can assist terahertz wireless communication to restore the fragile line-of-sight links and facilitate beam steering. Arbitrary reflection beam patterns are desired to meet diverse requirements in different applications. This paper establishes relationship between RIS beam pattern design with two-dimensional finite impulse response filter design and proposes… ▽ More

    Submitted 5 May, 2022; originally announced May 2022.

    Comments: 5 pages, 5 figures

  35. arXiv:2204.14057  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast

    Authors: Boqing Zhu, Kele Xu, Changjian Wang, Zheng Qin, Tao Sun, Huaimin Wang, Yuxing Peng

    Abstract: We present an approach to learn voice-face representations from the talking face videos, without any identity labels. Previous works employ cross-modal instance discrimination tasks to establish the correlation of voice and face. These methods neglect the semantic content of different videos, introducing false-negative pairs as training noise. Furthermore, the positive pairs are constructed based… ▽ More

    Submitted 26 May, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

    Comments: 8 pages, 4 figures. Accepted by IJCAI-2022

  36. arXiv:2203.03863  [pdf, ps, other

    eess.SP

    Amplitude-Constrained Constellation and Reflection Pattern Designs for Directional Backscatter Communications Using Programmable Metasurface

    Authors: Wei Wang, Bincheng Zhu, Yongming Huang, Wei Zhang

    Abstract: The large scale reflector array of programmable metasurfaces is capable of increasing the power efficiency of backscatter communications via passive beamforming and thus has the potential to revolutionize the low-data-rate nature of backscatter communications. In this paper, we propose to design the power-efficient higher-order constellation and reflection pattern under the amplitude constraint br… ▽ More

    Submitted 30 March, 2023; v1 submitted 8 March, 2022; originally announced March 2022.

    Comments: Accepted in IEEE Transactions on Wireless Communications

  37. arXiv:2202.10139  [pdf, other

    eess.AS cs.IR cs.SD

    S3T: Self-Supervised Pre-training with Swin Transformer for Music Classification

    Authors: Hang Zhao, Chen Zhang, Belei Zhu, Zejun Ma, Kejun Zhang

    Abstract: In this paper, we propose S3T, a self-supervised pre-training method with Swin Transformer for music classification, aiming to learn meaningful music representations from massive easily accessible unlabeled music data. S3T introduces a momentum-based paradigm, MoCo, with Swin Transformer as its feature extractor to music time-frequency domain. For better music representations learning, S3T contrib… ▽ More

    Submitted 21 February, 2022; originally announced February 2022.

    Comments: Accepted by ICASSP2022

  38. arXiv:2202.05267  [pdf, other

    physics.med-ph cs.CV eess.IV

    On Real-time Image Reconstruction with Neural Networks for MRI-guided Radiotherapy

    Authors: David E. J. Waddington, Nicholas Hindley, Neha Koonjoo, Christopher Chiu, Tess Reynolds, Paul Z. Y. Liu, Bo Zhu, Danyal Bhutto, Chiara Paganelli, Paul J. Keall, Matthew S. Rosen

    Abstract: MRI-guidance techniques that dynamically adapt radiation beams to follow tumor motion in real-time will lead to more accurate cancer treatments and reduced collateral healthy tissue damage. The gold-standard for reconstruction of undersampled MR data is compressed sensing (CS) which is computationally slow and limits the rate that images can be available for real-time adaptation. Here, we demonstr… ▽ More

    Submitted 18 May, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

    Comments: 12 pages, 6 figures, 1 table. v2 has a typo in eqn 1 corrected and references added to the discussion

  39. arXiv:2202.01269  [pdf, ps, other

    cs.LG eess.SP math.ST stat.CO stat.ML

    Robust Estimation for Nonparametric Families via Generative Adversarial Networks

    Authors: Banghua Zhu, Jiantao Jiao, Michael I. Jordan

    Abstract: We provide a general framework for designing Generative Adversarial Networks (GANs) to solve high dimensional robust statistics problems, which aim at estimating unknown parameter of the true distribution given adversarially corrupted samples. Prior work focus on the problem of robust mean and covariance estimation when the true distribution lies in the family of Gaussian distributions or elliptic… ▽ More

    Submitted 2 February, 2022; originally announced February 2022.

  40. arXiv:2202.00874  [pdf, other

    cs.SD cs.AI cs.IR cs.LG eess.AS

    HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection

    Authors: Ke Chen, Xingjian Du, Bilei Zhu, Zejun Ma, Taylor Berg-Kirkpatrick, Shlomo Dubnov

    Abstract: Audio classification is an important task of mapping audio samples into their corresponding labels. Recently, the transformer model with self-attention mechanisms has been adopted in this field. However, existing audio transformers require large GPU memories and long training time, meanwhile relying on pretrained vision models to achieve high performance, which limits the model's scalability in au… ▽ More

    Submitted 1 February, 2022; originally announced February 2022.

    Comments: Preprint version for ICASSP 2022, Singapore

  41. arXiv:2201.08563  [pdf, other

    eess.SP

    Performance Analysis of Hybrid RF-Reconfigurable Intelligent Surfaces Assisted FSO Communication

    Authors: Haibo Wang, Zaichen Zhang, Bingcheng Zhu, Yidi Zhang

    Abstract: Optical reconfigurable intelligent surface (ORIS) is an emerging technology that can achieve reconfigurable optical propagation environments by precisely adjusting signal's reflection and shape through a large number of passive reflecting elements. In this paper, we investigate the performance of an ORIS-assisted dual-hop hybrid radio frequency (RF) and free space optics (FSO) communication system… ▽ More

    Submitted 21 January, 2022; originally announced January 2022.

  42. arXiv:2112.07891  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data

    Authors: Ke Chen, Xingjian Du, Bilei Zhu, Zejun Ma, Taylor Berg-Kirkpatrick, Shlomo Dubnov

    Abstract: Deep learning techniques for separating audio into different sound sources face several challenges. Standard architectures require training separate models for different types of audio sources. Although some universal separators employ a single model to target multiple sources, they have difficulty generalizing to unseen sources. In this paper, we propose a three-component pipeline to train a univ… ▽ More

    Submitted 12 February, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: Preprint version for Association for the Advancement of Artificial Intelligence Conference, AAAI 2022

  43. arXiv:2112.00333  [pdf, ps, other

    eess.SY cs.LG

    Joint Cluster Head Selection and Trajectory Planning in UAV-Aided IoT Networks by Reinforcement Learning with Sequential Model

    Authors: Botao Zhu, Ebrahim Bedeer, Ha H. Nguyen, Robert Barton, Jerome Henry

    Abstract: Employing unmanned aerial vehicles (UAVs) has attracted growing interests and emerged as the state-of-the-art technology for data collection in Internet-of-Things (IoT) networks. In this paper, with the objective of minimizing the total energy consumption of the UAV-IoT system, we formulate the problem of jointly designing the UAV's trajectory and selecting cluster heads in the IoT network as a co… ▽ More

    Submitted 1 December, 2021; originally announced December 2021.

    Comments: This paper has been accepted in IEEE IoT-J

  44. Deep Instance Segmentation with Automotive Radar Detection Points

    Authors: Jianan Liu, Weiyi Xiong, Liping Bai, Yuxuan Xia, Tao Huang, Wanli Ouyang, Bing Zhu

    Abstract: Automotive radar provides reliable environmental perception in all-weather conditions with affordable cost, but it hardly supplies semantic and geometry information due to the sparsity of radar detection points. With the development of automotive radar technologies in recent years, instance segmentation becomes possible by using automotive radar. Its data contain contexts such as radar cross secti… ▽ More

    Submitted 5 February, 2023; v1 submitted 4 October, 2021; originally announced October 2021.

    Comments: 11 pages, 9 figures, 3 tables, accepted by IEEE Transactions on Intelligent Vehicles

  45. arXiv:2109.14926  [pdf, other

    math.NA eess.SP eess.SY math.OC

    A Fast Robust Numerical Continuation Solver to a Two-Dimensional Spectral Estimation Problem

    Authors: Bin Zhu, Jiahao Liu

    Abstract: This paper presents a fast algorithm to solve a spectral estimation problem for two-dimensional random fields. The latter is formulated as a convex optimization problem with the Itakura-Saito pseudodistance as the objective function subject to the constraints of moment equations. We exploit the structure of the Hessian of the dual objective function in order to make possible a fast Newton solver.… ▽ More

    Submitted 30 September, 2021; originally announced September 2021.

    Comments: 13 pages, 8 figures

  46. arXiv:2109.05848  [pdf, other

    eess.SP cs.AR

    Closed-Loop Neural Prostheses with On-Chip Intelligence: A Review and A Low-Latency Machine Learning Model for Brain State Detection

    Authors: Bingzhao Zhu, Uisub Shin, Mahsa Shoaran

    Abstract: The application of closed-loop approaches in systems neuroscience and therapeutic stimulation holds great promise for revolutionizing our understanding of the brain and for developing novel neuromodulation therapies to restore lost functions. Neural prostheses capable of multi-channel neural recording, on-site signal processing, rapid symptom detection, and closed-loop stimulation are critical to… ▽ More

    Submitted 13 September, 2021; originally announced September 2021.

  47. arXiv:2109.03990  [pdf, other

    eess.SP

    A Novel Method to Estimate the Coordinates of LEDs in Wireless Optical Positioning Systems

    Authors: Kehan Zhang, Zaichen Zhang, Bingcheng Zhu

    Abstract: Traditional visible light positioning (VLP) systems estimate receivers' coordinates based on the known light-emitting diode (LED) coordinates. However, the LED coordinates are not always known accurately. Because of the structural changes of the buildings due to temperature, humidity or material aging, even measured by highly accurate laser range finders, the LED coordinates may change unpredictab… ▽ More

    Submitted 8 September, 2021; originally announced September 2021.

    Comments: 5 pages, 4 figures, conference

  48. arXiv:2109.00354  [pdf, ps, other

    cs.IT eess.SP

    Outage Analysis and Beamwidth Optimization for Positioning-Assisted Beamforming

    Authors: Bingcheng Zhu, Zaichen Zhang, Julian Cheng

    Abstract: Conventional beamforming is based on channel estimation, which can be computationally intensive and inaccurate when the antenna array is large. In this work, we study the outage probability of positioning-assisted beamforming systems. Closed-form outage probability bounds are derived by considering positioning error, link distance and beamwidth. Based on the analytical result, we show that the bea… ▽ More

    Submitted 9 April, 2022; v1 submitted 1 September, 2021; originally announced September 2021.

  49. arXiv:2108.00354  [pdf, ps, other

    eess.SY cs.LG

    UAV Trajectory Planning in Wireless Sensor Networks for Energy Consumption Minimization by Deep Reinforcement Learning

    Authors: Botao Zhu, Ebrahim Bedeer, Ha H. Nguyen, Robert Barton, Jerome Henry

    Abstract: Unmanned aerial vehicles (UAVs) have emerged as a promising candidate solution for data collection of large-scale wireless sensor networks (WSNs). In this paper, we investigate a UAV-aided WSN, where cluster heads (CHs) receive data from their member nodes, and a UAV is dispatched to collect data from CHs along the planned trajectory. We aim to minimize the total energy consumption of the UAV-WSN… ▽ More

    Submitted 31 July, 2021; originally announced August 2021.

    Journal ref: IEEE TVT, 2021

  50. arXiv:2106.11411  [pdf, other

    cs.SD eess.AS

    Attention-based cross-modal fusion for audio-visual voice activity detection in musical video streams

    Authors: Yuanbo Hou, Zhesong Yu, Xia Liang, Xingjian Du, Bilei Zhu, Zejun Ma, Dick Botteldooren

    Abstract: Many previous audio-visual voice-related works focus on speech, ignoring the singing voice in the growing number of musical video streams on the Internet. For processing diverse musical video data, voice activity detection is a necessary step. This paper attempts to detect the speech and singing voices of target performers in musical video streams using audiovisual information. To integrate inform… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

    Comments: Accepted by INTERSPEECH 2021