Skip to main content

Showing 1–50 of 155 results for author: Zhao, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.07526  [pdf, ps, other

    cs.SD eess.AS

    DMF2Mel: A Dynamic Multiscale Fusion Network for EEG-Driven Mel Spectrogram Reconstruction

    Authors: Cunhang Fan, Sheng Zhang, Jingjing Zhang, Enrui Liu, Xinhui Li, Minggang Zhao, Zhao Lv

    Abstract: Decoding speech from brain signals is a challenging research problem. Although existing technologies have made progress in reconstructing the mel spectrograms of auditory stimuli at the word or letter level, there remain core challenges in the precise reconstruction of minute-level continuous imagined speech: traditional models struggle to balance the efficiency of temporal dependency modeling and… ▽ More

    Submitted 10 July, 2025; originally announced July 2025.

    Comments: Accepted by ACM MM 2025

  2. arXiv:2506.23301  [pdf, ps, other

    cs.IT eess.SP

    Parallax QAMA: Novel Downlink Multiple Access for MISO Systems with Simple Receivers

    Authors: Jie Huang, Ming Zhao, Shengli Zhou, Ling Qiu, Jinkang Zhu

    Abstract: In this paper, we propose a novel downlink multiple access system with a multi-antenna transmitter and two single-antenna receivers, inspired by the underlying principles of hierarchical quadrature amplitude modulation (H-QAM) based multiple access (QAMA) and space-division multiple access (SDMA). In the proposed scheme, coded bits from two users are split and assigned to one shared symbol and two… ▽ More

    Submitted 29 June, 2025; originally announced June 2025.

  3. arXiv:2506.22216  [pdf, ps, other

    cs.CV eess.IV

    ReF-LLE: Personalized Low-Light Enhancement via Reference-Guided Deep Reinforcement Learning

    Authors: Ming Zhao, Pingping Liu, Tongshun Zhang, Zhe Zhang

    Abstract: Low-light image enhancement presents two primary challenges: 1) Significant variations in low-light images across different conditions, and 2) Enhancement levels influenced by subjective preferences and user intent. To address these issues, we propose ReF-LLE, a novel personalized low-light image enhancement method that operates in the Fourier frequency domain and incorporates deep reinforcement l… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: 6 pages, 8 figures, accepted by ICME2025

  4. arXiv:2506.20158  [pdf, ps, other

    cs.IT eess.SP

    Efficient Channel Estimation for Rotatable Antenna-Enabled Wireless Communication

    Authors: Xue Xiong, Beixiong Zheng, Wen Wu, Xiaodan Shao, Liang Dai, Ming-Min Zhao, Jie Tang

    Abstract: Non-fixed flexible antenna architectures, such as fluid antenna system (FAS), movable antenna (MA), and pinching antenna, have garnered significant interest in recent years. Among them, rotatable antenna (RA) is a promising antenna architecture that exploits additional spatial degrees of freedom (DoFs) to enhance the communication performance. To fully obtain the performance gain provided by RAs,… ▽ More

    Submitted 29 June, 2025; v1 submitted 25 June, 2025; originally announced June 2025.

    Comments: 5 pages, 4 figures

  5. arXiv:2504.20653  [pdf, other

    cs.SE eess.SY

    ComplexVCoder: An LLM-Driven Framework for Systematic Generation of Complex Verilog Code

    Authors: Jian Zuo, Junzhe Liu, Xianyong Wang, Yicheng Liu, Navya Goli, Tong Xu, Hao Zhang, Umamaheswara Rao Tida, Zhenge Jia, Mengying Zhao

    Abstract: Recent advances have demonstrated the promising capabilities of large language models (LLMs) in generating register-transfer level (RTL) code, such as Verilog. However, existing LLM-based frameworks still face significant challenges in accurately handling the complexity of real-world RTL designs, particularly those that are large-scale and involve multi-level module instantiations. To address this… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  6. arXiv:2504.13741  [pdf, ps, other

    cs.IT eess.SP

    Sensing-Then-Beamforming: Robust Transmission Design for RIS-Empowered Integrated Sensing and Covert Communication

    Authors: Xingyu Zhao, Min Li, Ming-Min Zhao, Shihao Yan, Min-Jian Zhao

    Abstract: Traditional covert communication often relies on the knowledge of the warden's channel state information, which is inherently challenging to obtain due to the non-cooperative nature and potential mobility of the warden. The integration of sensing and communication technology provides a promising solution by enabling the legitimate transmitter to sense and track the warden, thereby enhancing transm… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Comments: 13 pages; submitted for possible publication

  7. arXiv:2503.22486  [pdf, other

    cs.IT eess.SP

    Movable Antenna Enhanced Downlink Multi-User Integrated Sensing and Communication System

    Authors: Yanze Han, Min Li, Xingyu Zhao, Ming-Min Zhao, Min-Jian Zhao

    Abstract: This work investigates the potential of exploiting movable antennas (MAs) to enhance the performance of a multi-user downlink integrated sensing and communication (ISAC) system. Specifically, we formulate an optimization problem to maximize the transmit beampattern gain for sensing while simultaneously meeting each user's communication requirement by jointly optimizing antenna positions and beamfo… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: accepted and to appear in IEEE VTC2025-Spring

  8. arXiv:2503.21818  [pdf

    eess.IV cs.CV

    Deep Learning-Based Quantitative Assessment of Renal Chronicity Indices in Lupus Nephritis

    Authors: Tianqi Tu, Hui Wang, Jiangbo Pei, Xiaojuan Yu, Aidong Men, Suxia Wang, Qingchao Chen, Ying Tan, Feng Yu, Minghui Zhao

    Abstract: Background: Renal chronicity indices (CI) have been identified as strong predictors of long-term outcomes in lupus nephritis (LN) patients. However, assessment by pathologists is hindered by challenges such as substantial time requirements, high interobserver variation, and susceptibility to fatigue. This study aims to develop an effective deep learning (DL) pipeline that automates the assessment… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  9. arXiv:2503.16149  [pdf, other

    eess.IV cs.CV

    Selective Complementary Feature Fusion and Modal Feature Compression Interaction for Brain Tumor Segmentation

    Authors: Dong Chen, Boyue Zhao, Yi Zhang, Meng Zhao

    Abstract: Efficient modal feature fusion strategy is the key to achieve accurate segmentation of brain glioma. However, due to the specificity of different MRI modes, it is difficult to carry out cross-modal fusion with large differences in modal features, resulting in the model ignoring rich feature information. On the other hand, the problem of multi-modal feature redundancy interaction occurs in parallel… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  10. arXiv:2503.11551  [pdf, ps, other

    cs.RO eess.SY

    Vectorable Thrust Control for Multimodal Locomotion of Quadruped Robot SPIDAR

    Authors: Moju Zhao

    Abstract: In this paper, I present vectorable thrust control for different locomotion modes by a novel quadruped robot, SPIDAR, equipped with vectoring rotor in each link. First, the robot's unique mechanical design, the dynamics model, and the basic control framework for terrestrial/aerial locomotion are briefly introduced. Second, a vectorable thrust control method derived from the basic control framework… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: 16 Pages. Presented in International Symposium of Robotics Research (ISRR) 2024, Long Beach, USA

  11. arXiv:2503.11190  [pdf, other

    cs.SD cs.AI cs.CL cs.MM eess.AS

    Cross-Modal Learning for Music-to-Music-Video Description Generation

    Authors: Zhuoyuan Mao, Mengjie Zhao, Qiyu Wu, Zhi Zhong, Wei-Hsiang Liao, Hiromi Wakaki, Yuki Mitsufuji

    Abstract: Music-to-music-video generation is a challenging task due to the intrinsic differences between the music and video modalities. The advent of powerful text-to-video diffusion models has opened a promising pathway for music-video (MV) generation by first addressing the music-to-MV description task and subsequently leveraging these models for video generation. In this study, we focus on the MV descri… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: Accepted by RepL4NLP 2025 @ NAACL 2025

  12. arXiv:2503.04407  [pdf, ps, other

    eess.SP cs.IT

    Ambiguity Function Analysis and Optimization of Frequency-Hopping MIMO Radar with Movable Antennas

    Authors: Xiang Chen, Ming-Min Zhao, Min Li, Liyan Li, Min-Jian Zhao, Jiangzhou Wang

    Abstract: In this paper, we propose a movable antenna (MA)-enabled frequency-hopping (FH) multiple-input multiple-output (MIMO) radar system and investigate its sensing resolution. Specifically, we derive the expression of the ambiguity function and analyze the relationship between its main lobe width and the transmit antenna positions. In particular, the optimal antenna distribution to achieve the minimum… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

    Comments: 15 pages, 13 figures

  13. arXiv:2502.20690  [pdf, ps, other

    eess.SP

    Multi-model Stochastic Particle-based Variational Bayesian Inference for Multiband Delay Estimation

    Authors: Zhixiang Hu, An Liu, Minjian Zhao

    Abstract: Joint utilization of multiple discrete frequency bands can enhance the accuracy of delay estimation. Although some unique challenges of multiband fusion, such as phase distortion, oscillation phenomena, and high-dimensional search, have been partially addressed, further challenges remain. Specifically, under conditions of low signal-to-noise ratio (SNR), insufficient data, and closely spaced delay… ▽ More

    Submitted 8 July, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

  14. arXiv:2502.20054  [pdf, other

    cs.RO eess.SY

    Night-Voyager: Consistent and Efficient Nocturnal Vision-Aided State Estimation in Object Maps

    Authors: Tianxiao Gao, Mingle Zhao, Chengzhong Xu, Hui Kong

    Abstract: Accurate and robust state estimation at nighttime is essential for autonomous robotic navigation to achieve nocturnal or round-the-clock tasks. An intuitive question arises: Can low-cost standard cameras be exploited for nocturnal state estimation? Regrettably, most existing visual methods may fail under adverse illumination conditions, even with active lighting or image enhancement. A pivotal ins… ▽ More

    Submitted 4 March, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

    Comments: IEEE Transactions on Robotics (T-RO), 2025

  15. arXiv:2502.16709  [pdf

    eess.IV cs.CV

    FedDA-TSformer: Federated Domain Adaptation with Vision TimeSformer for Left Ventricle Segmentation on Gated Myocardial Perfusion SPECT Image

    Authors: Yehong Huang, Chen Zhao, Rochak Dhakal, Min Zhao, Guang-Uei Hung, Zhixin Jiang, Weihua Zhou

    Abstract: Background and Purpose: Functional assessment of the left ventricle using gated myocardial perfusion (MPS) single-photon emission computed tomography relies on the precise extraction of the left ventricular contours while simultaneously ensuring the security of patient data. Methods: In this paper, we introduce the integration of Federated Domain Adaptation with TimeSformer, named 'FedDA-TSformer'… ▽ More

    Submitted 23 February, 2025; originally announced February 2025.

  16. arXiv:2502.14198  [pdf, ps, other

    cs.IT eess.SP

    Antenna Position and Beamforming Optimization for Movable Antenna Enabled ISAC: Optimal Solutions and Efficient Algorithms

    Authors: Lebin Chen, Ming-Min Zhao, Min-Jian Zhao, Rui Zhang

    Abstract: In this paper, we propose an integrated sensing and communication (ISAC) system enabled by movable antennas (MAs), which can dynamically adjust antenna positions to enhance both sensing and communication performance for future wireless networks. To characterize the benefits of MA-enabled ISAC systems, we first derive the Cramér-Rao bound (CRB) for angle estimation error, which is then minimized fo… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

    Comments: 13 pages, 7 figures

  17. arXiv:2502.12623  [pdf, other

    cs.SD cs.AI cs.CL cs.MM eess.AS

    DeepResonance: Enhancing Multimodal Music Understanding via Music-centric Multi-way Instruction Tuning

    Authors: Zhuoyuan Mao, Mengjie Zhao, Qiyu Wu, Hiromi Wakaki, Yuki Mitsufuji

    Abstract: Recent advancements in music large language models (LLMs) have significantly improved music understanding tasks, which involve the model's ability to analyze and interpret various musical elements. These improvements primarily focused on integrating both music and text inputs. However, the potential of incorporating additional modalities such as images, videos and textual music features to enhance… ▽ More

    Submitted 20 May, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

  18. arXiv:2501.13130  [pdf, other

    eess.IV

    A Novel Scene Coupling Semantic Mask Network for Remote Sensing Image Segmentation

    Authors: Xiaowen Ma, Rongrong Lian, Zhenkai Wu, Renxiang Guan, Tingfeng Hong, Mengjiao Zhao, Mengting Ma, Jiangtao Nie, Zhenhong Du, Siyang Song, Wei Zhang

    Abstract: As a common method in the field of computer vision, spatial attention mechanism has been widely used in semantic segmentation of remote sensing images due to its outstanding long-range dependency modeling capability. However, remote sensing images are usually characterized by complex backgrounds and large intra-class variance that would degrade their analysis performance. While vanilla spatial att… ▽ More

    Submitted 21 January, 2025; originally announced January 2025.

    Comments: Accepted by ISPRS Journal of Photogrammetry and Remote Sensing

  19. arXiv:2501.11869  [pdf, other

    eess.IV cs.IT stat.AP

    Saturation in Snapshot Compressive Imaging

    Authors: Mengyu Zhao, Shirin Jalali

    Abstract: Snapshot Compressive Imaging (SCI) maps three-dimensional (3D) data cubes, such as videos or hyperspectral images, into two-dimensional (2D) measurements via optical modulation, enabling efficient data acquisition and reconstruction. Recent advances have shown the potential of mask optimization to enhance SCI performance, but most studies overlook nonlinear distortions caused by saturation in prac… ▽ More

    Submitted 20 January, 2025; originally announced January 2025.

    Comments: 13 pages

  20. arXiv:2501.06653  [pdf, other

    cs.IT eess.IV stat.AP

    Theoretical Characterization of Effect of Masks in Snapshot Compressive Imaging

    Authors: Mengyu Zhao, Shirin Jalali

    Abstract: Snapshot compressive imaging (SCI) refers to the recovery of three-dimensional data cubes-such as videos or hyperspectral images-from their two-dimensional projections, which are generated by a special encoding of the data with a mask. SCI systems commonly use binary-valued masks that follow certain physical constraints. Optimizing these masks subject to these constraints is expected to improve sy… ▽ More

    Submitted 11 January, 2025; originally announced January 2025.

    Comments: 27 pages. arXiv admin note: substantial text overlap with arXiv:2307.07796

  21. Separate Source Channel Coding Is Still What You Need: An LLM-based Rethinking

    Authors: Tianqi Ren, Rongpeng Li, Ming-min Zhao, Xianfu Chen, Guangyi Liu, Yang Yang, Zhifeng Zhao, Honggang Zhang

    Abstract: Along with the proliferating research interest in Semantic Communication (SemCom), Joint Source Channel Coding (JSCC) has dominated the attention due to the widely assumed existence in efficiently delivering information semantics. Nevertheless, this paper challenges the conventional JSCC paradigm, and advocates for adoption of Separate Source Channel Coding (SSCC) to enjoy the underlying more degr… ▽ More

    Submitted 26 May, 2025; v1 submitted 8 January, 2025; originally announced January 2025.

    Journal ref: ZTE Communications, vol. 23, no. 1, pp. 30-44, Mar. 2025

  22. arXiv:2412.19475  [pdf, other

    eess.SP

    Exploiting Dynamic Sparsity for Near-Field Spatial Non-Stationary XL-MIMO Channel Tracking

    Authors: Wenkang Xu, An Liu, Min-jian Zhao, Giuseppe Caire, Yik-Chung Wu

    Abstract: This work considers a spatial non-stationary channel tracking problem in broadband extremely large-scale multiple-input-multiple-output (XL-MIMO) systems. In the case of spatial non-stationary, each scatterer has a certain visibility region (VR) over antennas and power change may occur among visible antennas. Concentrating on the temporal correlation of XL-MIMO channels, we design a three-layer Ma… ▽ More

    Submitted 31 March, 2025; v1 submitted 27 December, 2024; originally announced December 2024.

    Comments: 13 pages, 11 figures,Submitted to IEEE TSP

  23. arXiv:2411.11192  [pdf

    cs.RO cs.MA eess.SY

    Robot Metabolism: Towards machines that can grow by consuming other machines

    Authors: Philippe Martin Wyder, Riyaan Bakhda, Meiqi Zhao, Quinn A. Booth, Matthew E. Modi, Andrew Song, Simon Kang, Jiahao Wu, Priya Patel, Robert T. Kasumi, David Yi, Nihar Niraj Garg, Pranav Jhunjhunwala, Siddharth Bhutoria, Evan H. Tong, Yuhang Hu, Judah Goldfeder, Omer Mustel, Donghan Kim, Hod Lipson

    Abstract: Biological lifeforms can heal, grow, adapt, and reproduce -- abilities essential for sustained survival and development. In contrast, robots today are primarily monolithic machines with limited ability to self-repair, physically develop, or incorporate material from their environments. A key challenge to such physical adaptation has been that while robot minds are rapidly evolving new behaviors th… ▽ More

    Submitted 17 November, 2024; originally announced November 2024.

    Comments: Manuscript combined with Supplementary Materials File for arXiv submission. Submitting to Journal and will update external DOI once available

    MSC Class: 70-01; 68-02 ACM Class: I.6; H.4; H.m; I.m; B.m

  24. arXiv:2411.06307  [pdf, other

    cs.SD eess.AS

    Acoustic Volume Rendering for Neural Impulse Response Fields

    Authors: Zitong Lan, Chenhao Zheng, Zhiwei Zheng, Mingmin Zhao

    Abstract: Realistic audio synthesis that captures accurate acoustic phenomena is essential for creating immersive experiences in virtual and augmented reality. Synthesizing the sound received at any position relies on the estimation of impulse response (IR), which characterizes how sound propagates in one scene along different paths before arriving at the listener's position. In this paper, we present Acous… ▽ More

    Submitted 9 November, 2024; originally announced November 2024.

    Comments: NeurIPS 2024 Spotlight

  25. arXiv:2410.15573  [pdf, other

    cs.SD cs.AI cs.CL cs.MM eess.AS

    OpenMU: Your Swiss Army Knife for Music Understanding

    Authors: Mengjie Zhao, Zhi Zhong, Zhuoyuan Mao, Shiqi Yang, Wei-Hsiang Liao, Shusuke Takahashi, Hiromi Wakaki, Yuki Mitsufuji

    Abstract: We present OpenMU-Bench, a large-scale benchmark suite for addressing the data scarcity issue in training multimodal language models to understand music. To construct OpenMU-Bench, we leveraged existing datasets and bootstrapped new annotations. OpenMU-Bench also broadens the scope of music understanding by including lyrics understanding and music tool usage. Using OpenMU-Bench, we trained our mus… ▽ More

    Submitted 27 November, 2024; v1 submitted 20 October, 2024; originally announced October 2024.

    Comments: Resources: https://github.com/sony/openmu

  26. arXiv:2410.05151  [pdf, other

    eess.AS cs.SD

    Editing Music with Melody and Text: Using ControlNet for Diffusion Transformer

    Authors: Siyuan Hou, Shansong Liu, Ruibin Yuan, Wei Xue, Ying Shan, Mangsuo Zhao, Chao Zhang

    Abstract: Despite the significant progress in controllable music generation and editing, challenges remain in the quality and length of generated music due to the use of Mel-spectrogram representations and UNet-based model structures. To address these limitations, we propose a novel approach using a Diffusion Transformer (DiT) augmented with an additional control branch using ControlNet. This allows for lon… ▽ More

    Submitted 16 January, 2025; v1 submitted 7 October, 2024; originally announced October 2024.

    Comments: Accepted for publication at ICASSP 2025

  27. arXiv:2410.00404  [pdf, other

    eess.IV cs.CV

    3DGR-CAR: Coronary artery reconstruction from ultra-sparse 2D X-ray views with a 3D Gaussians representation

    Authors: Xueming Fu, Yingtai Li, Fenghe Tang, Jun Li, Mingyue Zhao, Gao-Jun Teng, S. Kevin Zhou

    Abstract: Reconstructing 3D coronary arteries is important for coronary artery disease diagnosis, treatment planning and operation navigation. Traditional reconstruction techniques often require many projections, while reconstruction from sparse-view X-ray projections is a potential way of reducing radiation dose. However, the extreme sparsity of coronary arteries in a 3D volume and ultra-limited number of… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

    Comments: 10 pages, 5 figures, Accepted at MICCAI 2024

  28. arXiv:2409.07488  [pdf, other

    eess.SP cs.LG

    Contrastive Learning-based User Identification with Limited Data on Smart Textiles

    Authors: Yunkang Zhang, Ziyu Wu, Zhen Liang, Fangting Xie, Quan Wan, Mingjie Zhao, Xiaohui Cai

    Abstract: Pressure-sensitive smart textiles are widely applied in the fields of healthcare, sports monitoring, and intelligent homes. The integration of devices embedded with pressure sensing arrays is expected to enable comprehensive scene coverage and multi-device integration. However, the implementation of identity recognition, a fundamental function in this context, relies on extensive device-specific d… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

  29. arXiv:2409.04719  [pdf, other

    eess.IV

    Unrolling Plug-and-Play Network for Hyperspectral Unmixing

    Authors: Min Zhao, Linruize Tang, Jie Chen

    Abstract: Deep learning based unmixing methods have received great attention in recent years and achieve remarkable performance. These methods employ a data-driven approach to extract structure features from hyperspectral image, however, they tend to be less physical interpretable. Conventional unmixing methods are with much more interpretability, whereas they require manually designing regularization and c… ▽ More

    Submitted 7 September, 2024; originally announced September 2024.

  30. arXiv:2408.02943  [pdf, other

    eess.SP

    Recent Advances in Data-driven Intelligent Control for Wireless Communication: A Comprehensive Survey

    Authors: Wei Huo, Huiwen Yang, Nachuan Yang, Zhaohua Yang, Jiuzhou Zhang, Fuhai Nan, Xingzhou Chen, Yifan Mao, Suyang Hu, Pengyu Wang, Xuanyu Zheng, Mingming Zhao, Ling Shi

    Abstract: The advent of next-generation wireless communication systems heralds an era characterized by high data rates, low latency, massive connectivity, and superior energy efficiency. These systems necessitate innovative and adaptive strategies for resource allocation and device behavior control in wireless networks. Traditional optimization-based methods have been found inadequate in meeting the complex… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  31. arXiv:2407.21394  [pdf, other

    eess.IV cs.CV

    Force Sensing Guided Artery-Vein Segmentation via Sequential Ultrasound Images

    Authors: Yimeng Geng, Gaofeng Meng, Mingcong Chen, Guanglin Cao, Mingyang Zhao, Jianbo Zhao, Hongbin Liu

    Abstract: Accurate identification of arteries and veins in ultrasound images is crucial for vascular examinations and interventions in robotics-assisted surgeries. However, current methods for ultrasound vessel segmentation face challenges in distinguishing between arteries and veins due to their morphological similarities. To address this challenge, this study introduces a novel force sensing guided segmen… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  32. arXiv:2407.05619  [pdf, other

    cs.RO eess.SY

    AIRA: A Low-cost IR-based Approach Towards Autonomous Precision Drone Landing and NLOS Indoor Navigation

    Authors: Yanchen Liu, Minghui Zhao, Kaiyuan Hou, Junxi Xia, Charlie Carver, Stephen Xia, Xia Zhou, Xiaofan Jiang

    Abstract: Automatic drone landing is an important step for achieving fully autonomous drones. Although there are many works that leverage GPS, video, wireless signals, and active acoustic sensing to perform precise landing, autonomous drone landing remains an unsolved challenge for palm-sized microdrones that may not be able to support the high computational requirements of vision, wireless, or active audio… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  33. arXiv:2406.17672  [pdf, other

    cs.SD eess.AS

    SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond

    Authors: Marco Comunità, Zhi Zhong, Akira Takahashi, Shiqi Yang, Mengjie Zhao, Koichi Saito, Yukara Ikemiya, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji

    Abstract: Recent advances in generative models that iteratively synthesize audio clips sparked great success to text-to-audio synthesis (TTA), but with the cost of slow synthesis speed and heavy computation. Although there have been attempts to accelerate the iterative procedure, high-quality TTA systems remain inefficient due to hundreds of iterations required in the inference phase and large amount of mod… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: 6 pages, 8 figures, 8 tables. Audio samples: https://zzaudio.github.io/SpecMaskGIT/index.html

  34. arXiv:2406.08305  [pdf, other

    cs.NI eess.SP

    Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization

    Authors: Fengxiao Tang, Xiaonan Wang, Xun Yuan, Linfeng Luo, Ming Zhao, Tianchi Huang, Nei Kato

    Abstract: Network device and system health management is the foundation of modern network operations and maintenance. Traditional health management methods, relying on expert identification or simple rule-based algorithms, struggle to cope with the dynamic heterogeneous networks (DHNs) environment. Moreover, current state-of-the-art distributed anomaly detection methods, which utilize specific machine learn… ▽ More

    Submitted 2 March, 2025; v1 submitted 12 June, 2024; originally announced June 2024.

  35. arXiv:2405.19516  [pdf, other

    eess.SP cs.CV cs.LG cs.RO

    Enabling Visual Recognition at Radio Frequency

    Authors: Haowen Lai, Gaoxiang Luo, Yifei Liu, Mingmin Zhao

    Abstract: This paper introduces PanoRadar, a novel RF imaging system that brings RF resolution close to that of LiDAR, while providing resilience against conditions challenging for optical signals. Our LiDAR-comparable 3D imaging results enable, for the first time, a variety of visual recognition tasks at radio frequency, including surface normal estimation, semantic segmentation, and object detection. Pano… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  36. arXiv:2405.16791  [pdf, ps, other

    cs.IT eess.SP

    Joint Node Selection and Resource Allocation Optimization for Cooperative Sensing with a Shared Wireless Backhaul

    Authors: Mingxin Chen, Ming-Min Zhao, An Liu, Min Li, Qingjiang Shi

    Abstract: In this paper, we consider a cooperative sensing framework in the context of future multi-functional network with both communication and sensing ability, where one base station (BS) serves as a sensing transmitter and several nearby BSs serve as sensing receivers. Each receiver receives the sensing signal reflected by the target and communicates with the fusion center (FC) through a wireless multi… ▽ More

    Submitted 16 December, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

    Comments: 16 pages, 12 figures

  37. arXiv:2405.14598  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation

    Authors: Shiqi Yang, Zhi Zhong, Mengjie Zhao, Shusuke Takahashi, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji

    Abstract: In recent years, with the realistic generation results and a wide range of personalized applications, diffusion-based generative models gain huge attention in both visual and audio generation areas. Compared to the considerable advancements of text2image or text2audio generation, research in audio2visual or visual2audio generation has been relatively slow. The recent audio-visual generation method… ▽ More

    Submitted 24 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: 10 pages

  38. Design, Control, and Motion-Planning for a Root-Perching Rotor-Distributed Manipulator

    Authors: Takuzumi Nishio, Moju Zhao, Kei Okada, Masayuki Inaba

    Abstract: Manipulation performance improvement is crucial for aerial robots. For aerial manipulators, the baselink position and attitude errors directly affect the precision at the end effector. To address this stability problem, fixed-body approaches such as perching on the environment using the rotor suction force are useful. Additionally, conventional arm-equipped multirotors, called rotor-concentrated m… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: IEEE Transactions on Robotics (2023)

  39. arXiv:2405.04027  [pdf, other

    eess.SP

    Joint Visibility Region Detection and Channel Estimation for XL-MIMO Systems via Alternating MAP

    Authors: Wenkang Xu, An Liu, Min-jian Zhao

    Abstract: We investigate a joint visibility region (VR) detection and channel estimation problem in extremely large-scale multiple-input-multiple-output (XL-MIMO) systems, where near-field propagation and spatial non-stationary effects exist. In this case, each scatterer can only see a subset of antennas, i.e., it has a certain VR over the antennas. Because of the spatial correlation among adjacent sub-arra… ▽ More

    Submitted 21 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: 13 pages, 14 figures, submitted to IEEE TSP

  40. arXiv:2405.01242  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable Platforms

    Authors: Yueyuan Sui, Minghui Zhao, Junxi Xia, Xiaofan Jiang, Stephen Xia

    Abstract: We propose TRAMBA, a hybrid transformer and Mamba architecture for acoustic and bone conduction speech enhancement, suitable for mobile and wearable platforms. Bone conduction speech enhancement has been impractical to adopt in mobile and wearable platforms for several reasons: (i) data collection is labor-intensive, resulting in scarcity; (ii) there exists a performance gap between state of-art m… ▽ More

    Submitted 29 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  41. arXiv:2403.10873  [pdf, other

    cs.IT eess.SP

    CSI Transfer From Sub-6G to mmWave: Reduced-Overhead Multi-User Hybrid Beamforming

    Authors: Weicao Deng, Min Li, Ming-Min Zhao, Min-Jian Zhao, Osvaldo Simeone

    Abstract: Hybrid beamforming is vital in modern wireless systems, especially for massive MIMO and millimeter-wave (mmWave) deployments, offering efficient directional transmission with reduced hardware complexity. However, effective beamforming in multi-user scenarios relies heavily on accurate channel state information, the acquisition of which often requires significant pilot overhead, degrading system pe… ▽ More

    Submitted 14 November, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

    Comments: Accepted by IEEE JSAC NGAT

  42. arXiv:2402.03042  [pdf, other

    eess.SP

    Semi-Passive Intelligent Reflecting Surface Enabled Sensing Systems

    Authors: Qiaoyan Peng, Qingqing Wu, Wen Chen, Shaodan Ma, Ming-Min Zhao, Octavia A. Dobre

    Abstract: Intelligent reflecting surface (IRS) has garnered growing interest and attention due to its potential for facilitating and supporting wireless communications and sensing. This paper studies a semi-passive IRS-enabled sensing system, where an IRS consists of both passive reflecting elements and active sensors. Our goal is to minimize the Cramér-Rao bound (CRB) for parameter estimation under both po… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  43. arXiv:2311.12745  [pdf, other

    cs.NI eess.SP

    Learn to Augment Network Simulators Towards Digital Network Twins

    Authors: Yuru Zhang, Ming Zhao, Qiang Liu

    Abstract: Digital network twin (DNT) is a promising paradigm to replicate real-world cellular networks toward continual assessment, proactive management, and what-if analysis. Existing discussions have been focusing on using only deep learning techniques to build DNTs, which raises widespread concerns regarding their generalization, explainability, and transparency. In this paper, we explore an alternative… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  44. arXiv:2311.08201  [pdf, other

    eess.SP

    Joint Location Sensing and Channel Estimation for IRS-Aided mmWave ISAC Systems

    Authors: Zijian Chen, Ming-Min Zhao, Min Li, Fan Xu, Qingqing Wu, Min-Jian Zhao

    Abstract: In this paper, we investigate a self-sensing intelligent reflecting surface (IRS) aided millimeter wave (mmWave) integrated sensing and communication (ISAC) system. Unlike the conventional purely passive IRS, the self-sensing IRS can effectively reduce the path loss of sensing-related links, thus rendering it advantageous in ISAC systems. Aiming to jointly sense the target/scatterer/user positions… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  45. arXiv:2311.08188  [pdf, ps, other

    cs.IT eess.SP

    Fast List Decoding of High-Rate Polar Codes

    Authors: Yang Lu, Ming-Min Zhao, Ming Lei, Min-Jian Zhao

    Abstract: Due to the ability to provide superior error-correction performance, the successive cancellation list (SCL) algorithm is widely regarded as one of the most promising decoding algorithms for polar codes with short-to-moderate code lengths. However, the application of SCL decoding in low-latency communication scenarios is limited due to its sequential nature. To reduce the decoding latency, developi… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: 13 pages, 8 figures

  46. arXiv:2310.13267  [pdf, other

    cs.CL cs.CV cs.LG cs.SD eess.AS

    On the Language Encoder of Contrastive Cross-modal Models

    Authors: Mengjie Zhao, Junya Ono, Zhi Zhong, Chieh-Hsin Lai, Yuhta Takida, Naoki Murata, Wei-Hsiang Liao, Takashi Shibuya, Hiromi Wakaki, Yuki Mitsufuji

    Abstract: Contrastive cross-modal models such as CLIP and CLAP aid various vision-language (VL) and audio-language (AL) tasks. However, there has been limited investigation of and improvement in their language encoder, which is the central component of encoding natural language descriptions of image/audio into vector representations. We extensively evaluate how unsupervised and supervised sentence embedding… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

  47. arXiv:2310.05382  [pdf, other

    eess.SP

    A Stochastic Particle Variational Bayesian Inference Inspired Deep-Unfolding Network for Non-Convex Parameter Estimation

    Authors: Zhixiang Hu, An Liu, Minjian Zhao

    Abstract: Future wireless networks are envisioned to provide ubiquitous sensing services, which also gives rise to a substantial demand for high-dimensional non-convex parameter estimation, i.e., the associated likelihood function is non-convex and contains numerous local optima. Variational Bayesian inference (VBI) provides a powerful tool for modeling complex estimation problems and reasoning with prior i… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

  48. arXiv:2309.04508  [pdf, other

    cs.LG cs.AI eess.SP

    Spatial-Temporal Graph Attention Fuser for Calibration in IoT Air Pollution Monitoring Systems

    Authors: Keivan Faghih Niresi, Mengjie Zhao, Hugo Bissig, Henri Baumann, Olga Fink

    Abstract: The use of Internet of Things (IoT) sensors for air pollution monitoring has significantly increased, resulting in the deployment of low-cost sensors. Despite this advancement, accurately calibrating these sensors in uncontrolled environmental conditions remains a challenge. To address this, we propose a novel approach that leverages graph neural networks, specifically the graph attention network… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

  49. arXiv:2309.03114  [pdf, ps, other

    eess.SP

    NUV-DoA: NUV Prior-based Bayesian Sparse Reconstruction with Spatial Filtering for Super-Resolution DoA Estimation

    Authors: Mengyuan Zhao, Guy Revach, Tirza Routtenberg, Nir Shlezinger

    Abstract: Achieving high-resolution Direction of Arrival (DoA) recovery typically requires high Signal to Noise Ratio (SNR) and a sufficiently large number of snapshots. This paper presents NUV-DoA algorithm, that augments Bayesian sparse reconstruction with spatial filtering for super-resolution DoA estimation. By modeling each direction on the azimuth's grid with the sparsity-promoting normal with unknown… ▽ More

    Submitted 25 December, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

    Comments: This paper has 5 pages including reference, 11 figures. This paper has been accepted to ICASSP 2024 - 2024 International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

  50. arXiv:2308.13996  [pdf

    cs.LG eess.SY

    Improve in-situ life prediction and classification performance by capturing both the present state and evolution rate of battery aging

    Authors: Mingyuan Zhao, Yongzhi Zhang

    Abstract: This study develops a methodology by capturing both the battery aging state and degradation rate for improved life prediction performance. The aging state is indicated by six physical features of an equivalent circuit model that are extracted from the voltage relaxation data. And the degradation rate is captured by two features extracted from the differences between the voltage relaxation curves w… ▽ More

    Submitted 26 August, 2023; originally announced August 2023.