Skip to main content

Showing 1–50 of 269 results for author: Zheng, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.03343  [pdf, ps, other

    cs.CL eess.AS

    SHNU Multilingual Conversational Speech Recognition System for INTERSPEECH 2025 MLC-SLM Challenge

    Authors: Yuxiang Mei, Yuang Zheng, Dongxing Xu, Yanhua Long

    Abstract: This paper describes SHNU multilingual conversational speech recognition system (SHNU-mASR, team name-"maybe"), submitted to Track 1 of the INTERSPEECH 2025 MLC-SLM Challenge. Our system integrates a parallel-speech-encoder architecture with a large language model (LLM) to form a unified multilingual ASR framework. The parallel-speech-encoder consists of two pre-trained encoders, the Whisper-large… ▽ More

    Submitted 8 July, 2025; v1 submitted 4 July, 2025; originally announced July 2025.

    Comments: Accepted by Interspeech 2025 MLC-SLM workshop

  2. arXiv:2506.22059  [pdf, ps, other

    eess.SP

    Hybrid Constellation Modulation for Symbol-Level Precoding in RIS-Enhanced MU-MISO Systems

    Authors: Yupeng Zheng, Yi Ma, Rahim Tafazolli

    Abstract: The application of symbol-level precoding (SLP) in reconfigurable intelligent surfaces (RIS) enhanced multi-user multiple-input single-output (MU-MISO) systems faces two main challenges. First, the state-of-the-art joint reflecting and SLP optimization approach requires exhaustive enumeration of all possible transmit symbol combinations, resulting in scalability issues as the modulation order and… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: This work has been accepted by IEEE SPAWC 2025

  3. arXiv:2506.12712  [pdf, ps, other

    cs.CV eess.IV

    Combining Self-attention and Dilation Convolutional for Semantic Segmentation of Coal Maceral Groups

    Authors: Zhenghao Xi, Zhengnan Lv, Yang Zheng, Xiang Liu, Zhuang Yu, Junran Chen, Jing Hu, Yaqi Liu

    Abstract: The segmentation of coal maceral groups can be described as a semantic segmentation process of coal maceral group images, which is of great significance for studying the chemical properties of coal. Generally, existing semantic segmentation models of coal maceral groups use the method of stacking parameters to achieve higher accuracy. It leads to increased computational requirements and impacts mo… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  4. Rethinking Brain Tumor Segmentation from the Frequency Domain Perspective

    Authors: Minye Shao, Zeyu Wang, Haoran Duan, Yawen Huang, Bing Zhai, Shizheng Wang, Yang Long, Yefeng Zheng

    Abstract: Precise segmentation of brain tumors, particularly contrast-enhancing regions visible in post-contrast MRI (areas highlighted by contrast agent injection), is crucial for accurate clinical diagnosis and treatment planning but remains challenging. However, current methods exhibit notable performance degradation in segmenting these enhancing brain tumor areas, largely due to insufficient considerati… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: Accepted by IEEE Transactions on Medical Imaging

  5. arXiv:2506.02365  [pdf, ps, other

    cs.RO eess.SY

    Dynamic real-time multi-UAV cooperative mission planning method under multiple constraints

    Authors: Chenglou Liu, Yufeng Lu, Fangfang Xie, Tingwei Ji, Yao Zheng

    Abstract: As UAV popularity soars, so does the mission planning associated with it. The classical approaches suffer from the triple problems of decoupled of task assignment and path planning, poor real-time performance and limited adaptability. Aiming at these challenges, this paper proposes a dynamic real-time multi-UAV collaborative mission planning algorithm based on Dubins paths under a distributed form… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  6. arXiv:2505.23743  [pdf, ps, other

    cs.CV eess.IV

    DarkDiff: Advancing Low-Light Raw Enhancement by Retasking Diffusion Models for Camera ISP

    Authors: Amber Yijia Zheng, Yu Zhang, Jun Hu, Raymond A. Yeh, Chen Chen

    Abstract: High-quality photography in extreme low-light conditions is challenging but impactful for digital cameras. With advanced computing hardware, traditional camera image signal processor (ISP) algorithms are gradually being replaced by efficient deep networks that enhance noisy raw images more intelligently. However, existing regression-based models often minimize pixel errors and result in oversmooth… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  7. arXiv:2505.21181  [pdf

    cs.CV eess.IV

    Boosting Adversarial Transferability via High-Frequency Augmentation and Hierarchical-Gradient Fusion

    Authors: Yayin Zheng, Chen Wan, Zihong Guo, Hailing Kuang, Xiaohai Lu

    Abstract: Adversarial attacks have become a significant challenge in the security of machine learning models, particularly in the context of black-box defense strategies. Existing methods for enhancing adversarial transferability primarily focus on the spatial domain. This paper presents Frequency-Space Attack (FSA), a new adversarial attack framework that effectively integrates frequency-domain and spatial… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  8. arXiv:2505.09616  [pdf, other

    cs.SD cs.AI eess.AS

    SpecWav-Attack: Leveraging Spectrogram Resizing and Wav2Vec 2.0 for Attacking Anonymized Speech

    Authors: Yuqi Li, Yuanzhong Zheng, Zhongtian Guo, Yaoxuan Wang, Jianjun Yin, Haojun Fei

    Abstract: This paper presents SpecWav-Attack, an adversarial model for detecting speakers in anonymized speech. It leverages Wav2Vec2 for feature extraction and incorporates spectrogram resizing and incremental training for improved performance. Evaluated on librispeech-dev and librispeech-test, SpecWav-Attack outperforms conventional attacks, revealing vulnerabilities in anonymized speech systems and empha… ▽ More

    Submitted 10 January, 2025; originally announced May 2025.

    Comments: 2 pages,3 figures,1 chart

    MSC Class: I.2.0

  9. arXiv:2505.08982  [pdf, ps, other

    cs.LG eess.SP eess.SY

    Model-free Online Learning for the Kalman Filter: Forgetting Factor and Logarithmic Regret

    Authors: Jiachen Qian, Yang Zheng

    Abstract: We consider the problem of online prediction for an unknown, non-explosive linear stochastic system. With a known system model, the optimal predictor is the celebrated Kalman filter. In the case of unknown systems, existing approaches based on recursive least squares and its variants may suffer from degraded performance due to the highly imbalanced nature of the regression model. This imbalance ca… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  10. arXiv:2505.05768  [pdf, other

    eess.IV cs.AI cs.CV

    Predicting Diabetic Macular Edema Treatment Responses Using OCT: Dataset and Methods of APTOS Competition

    Authors: Weiyi Zhang, Peranut Chotcomwongse, Yinwen Li, Pusheng Xu, Ruijie Yao, Lianhao Zhou, Yuxuan Zhou, Hui Feng, Qiping Zhou, Xinyue Wang, Shoujin Huang, Zihao Jin, Florence H. T. Chung, Shujun Wang, Yalin Zheng, Mingguang He, Danli Shi, Paisan Ruamviboonsuk

    Abstract: Diabetic macular edema (DME) significantly contributes to visual impairment in diabetic patients. Treatment responses to intravitreal therapies vary, highlighting the need for patient stratification to predict therapeutic benefits and enable personalized strategies. To our knowledge, this study is the first to explore pre-treatment stratification for predicting DME treatment responses. To advance… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: 42 pages,5 tables, 12 figures, challenge report

  11. arXiv:2505.03266  [pdf

    physics.optics cs.IT eess.SP

    Rapid diagnostics of reconfigurable intelligent surfaces using space-time-coding modulation

    Authors: Yi Ning Zheng, Lei Zhang, Xiao Qing Chen, Marco Rossi, Giuseppe Castaldi, Shuo Liu, Tie Jun Cui, Vincenzo Galdi

    Abstract: Reconfigurable intelligent surfaces (RISs) have emerged as a key technology for shaping smart wireless environments in next-generation wireless communication systems. To support the large-scale deployment of RISs, a reliable and efficient diagnostic method is essential to ensure optimal performance. In this work, a robust and efficient approach for RIS diagnostics is proposed using a space-time co… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 30 pages, 6 figures, 1 table, supporting information

  12. arXiv:2504.21612  [pdf, other

    eess.IV

    Selective Variable Convolution Meets Dynamic Content Guided Attention for Infrared Small Target Detection

    Authors: Yirui Chen, Yiming Zhu, Yuxin Jing, Tianpei Zhang, Yuchen Zheng

    Abstract: Infrared Small Target Detection (IRSTD) system aims to identify small targets in complex backgrounds. Due to the convolution operation in Convolutional Neural Networks (CNNs), applying traditional CNNs to IRSTD presents challenges, since the feature extraction of small targets is often insufficient, resulting in the loss of critical features. To address these issues, we propose a dynamic content g… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  13. arXiv:2504.21581  [pdf, other

    eess.IV

    Make Both Ends Meet: A Synergistic Optimization Infrared Small Target Detection with Streamlined Computational Overhead

    Authors: Yuxin Jing, Yuchen Zheng, Jufeng Zhao, Guangmang Cui, Tianpei Zhang

    Abstract: Infrared small target detection(IRSTD) is widely recognized as a challenging task due to the inherent limitations of infrared imaging, including low signal-to-noise ratios, lack of texture details, and complex background interference. While most existing methods model IRSTD as a semantic segmentation task, but they suffer from two critical drawbacks: (1)blurred target boundaries caused by long-dis… ▽ More

    Submitted 4 May, 2025; v1 submitted 30 April, 2025; originally announced April 2025.

  14. arXiv:2504.12711  [pdf, other

    cs.CV cs.AI eess.IV

    NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results

    Authors: Xin Li, Yeying Jin, Xin Jin, Zongwei Wu, Bingchen Li, Yufei Wang, Wenhan Yang, Yu Li, Zhibo Chen, Bihan Wen, Robby T. Tan, Radu Timofte, Qiyu Rong, Hongyuan Jing, Mengmeng Zhang, Jinglong Li, Xiangyu Lu, Yi Ren, Yuting Liu, Meng Zhang, Xiang Chen, Qiyuan Guan, Jiangxin Dong, Jinshan Pan, Conglin Gou , et al. (112 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images. This challenge received a wide range of impressive solutions, which are developed and evaluated using our collected real-world Raindrop Clarity dataset. Unlike existing deraining datasets, our Raindrop Clarity dataset is more diverse and challenging in degradation types and contents, which includ… ▽ More

    Submitted 19 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

    Comments: Challenge Report of CVPR NTIRE 2025; 26 pages; Methods from 32 teams

  15. arXiv:2504.06240  [pdf, other

    math.OC eess.SY

    Dictionary-free Koopman Predictive Control for Autonomous Vehicles in Mixed Traffic

    Authors: Xu Shang, Zhaojian Li, Yang Zheng

    Abstract: Koopman Model Predictive Control (KMPC) and Data-EnablEd Predictive Control (DeePC) use linear models to approximate nonlinear systems and integrate them with predictive control. Both approaches have recently demonstrated promising performance in controlling Connected and Autonomous Vehicles (CAVs) in mixed traffic. However, selecting appropriate lifting functions for the Koopman operator in KMPC… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

  16. arXiv:2504.02061  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Aligned Better, Listen Better for Audio-Visual Large Language Models

    Authors: Yuxin Guo, Shuailei Ma, Shijie Ma, Xiaoyi Bao, Chen-Wei Xie, Kecheng Zheng, Tingyu Weng, Siyang Sun, Yun Zheng, Wei Zou

    Abstract: Audio is essential for multimodal video understanding. On the one hand, video inherently contains audio, which supplies complementary information to vision. Besides, video large language models (Video-LLMs) can encounter many audio-centric settings. However, existing Video-LLMs and Audio-Visual Large Language Models (AV-LLMs) exhibit deficiencies in exploiting audio information, leading to weak un… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

    Comments: Accepted to ICLR 2025

  17. arXiv:2504.00217  [pdf, other

    math.ST eess.SY

    Non-Asymptotic Analysis of Classical Spectrum Estimators for $L$-mixing Time-series Data with Unknown Means

    Authors: Yuping Zheng, Andrew Lamperski

    Abstract: Spectral estimation is an important tool in time series analysis, with applications including economics, astronomy, and climatology. The asymptotic theory for non-parametric estimation is well-known but the development of non-asymptotic theory is still ongoing. Our recent work obtained the first non-asymptotic error bounds on the Bartlett and Welch methods for $L$-mixing stochastic processes. The… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

    Comments: 7 pages, 2 figures, Under Review for Conference on Decision and Control 2025

  18. arXiv:2503.23377  [pdf, other

    cs.CV cs.AI cs.SD eess.AS

    JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization

    Authors: Kai Liu, Wei Li, Lai Chen, Shengqiong Wu, Yanhao Zheng, Jiayi Ji, Fan Zhou, Rongxin Jiang, Jiebo Luo, Hao Fei, Tat-Seng Chua

    Abstract: This paper introduces JavisDiT, a novel Joint Audio-Video Diffusion Transformer designed for synchronized audio-video generation (JAVG). Built upon the powerful Diffusion Transformer (DiT) architecture, JavisDiT is able to generate high-quality audio and video content simultaneously from open-ended user prompts. To ensure optimal synchronization, we introduce a fine-grained spatio-temporal alignme… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.

    Comments: Work in progress. Homepage: https://javisdit.github.io/

  19. arXiv:2503.22687  [pdf, other

    eess.AS cs.AI

    Qieemo: Speech Is All You Need in the Emotion Recognition in Conversations

    Authors: Jinming Chen, Jingyi Fang, Yuanzhong Zheng, Yaoxuan Wang, Haojun Fei

    Abstract: Emotion recognition plays a pivotal role in intelligent human-machine interaction systems. Multimodal approaches benefit from the fusion of diverse modalities, thereby improving the recognition accuracy. However, the lack of high-quality multimodal data and the challenge of achieving optimal alignment between different modalities significantly limit the potential for improvement in multimodal appr… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  20. arXiv:2503.16845  [pdf, ps, other

    math.OC eess.SY

    One-Point Residual Feedback Algorithms for Distributed Online Convex and Non-convex Optimization

    Authors: Yaowen Wang, Lipo Mo, Min Zuo, Yuanshi Zheng

    Abstract: This paper mainly addresses the distributed online optimization problem where the local objective functions are assumed to be convex or non-convex. First, the distributed algorithms are proposed for the convex and non-convex situations, where the one-point residual feedback technology is introduced to estimate gradient of local objective functions. Then the regret bounds of the proposed algorithms… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

  21. arXiv:2503.08001  [pdf, other

    eess.SY

    Joint Semantic Transmission and Resource Allocation for Intelligent Computation Task Offloading in MEC Systems

    Authors: Yuanpeng Zheng, Tiankui Zhang, Xidong Mu, Yuanwei Liu, Rong Huang

    Abstract: Mobile edge computing (MEC) enables the provision of high-reliability and low-latency applications by offering computation and storage resources in close proximity to end-users. Different from traditional computation task offloading in MEC systems, the large data volume and complex task computation of artificial intelligence involved intelligent computation task offloading have increased greatly.… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  22. arXiv:2502.21049  [pdf, other

    cs.CV cs.AI eess.IV

    Synthesizing Individualized Aging Brains in Health and Disease with Generative Models and Parallel Transport

    Authors: Jingru Fu, Yuqi Zheng, Neel Dey, Daniel Ferreira, Rodrigo Moreno

    Abstract: Simulating prospective magnetic resonance imaging (MRI) scans from a given individual brain image is challenging, as it requires accounting for canonical changes in aging and/or disease progression while also considering the individual brain's current status and unique characteristics. While current deep generative models can produce high-resolution anatomically accurate templates for population-w… ▽ More

    Submitted 28 February, 2025; originally announced February 2025.

    Comments: 20 pages, 9 figures, 6 tables, diffeomorphic registration, parallel transport, brain aging, medical image generation, Alzheimer's disease

  23. arXiv:2502.10610  [pdf

    cs.RO eess.SY

    Reachability-Aware Reinforcement Learning for Collision Avoidance in Human-Machine Shared Control

    Authors: Shiyue Zhao, Junzhi Zhang, Neda Masoud, Jianxiong Li, Yinan Zheng, Xiaohui Hou

    Abstract: Human-machine shared control in critical collision scenarios aims to aid drivers' accident avoidance through intervening only when necessary. Existing methods count on replanning collision-free trajectories and imposing human-machine tracking, which usually interrupts the driver's intent and increases the risk of conflict. Additionally, the lack of guaranteed trajectory feasibility under extreme c… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

    Comments: 12 pages, 13 figures, submitted to Advanced Engineering Informatics (ADVEI)

  24. arXiv:2502.08835  [pdf, ps, other

    math.OC eess.SY

    A Bundle-based Augmented Lagrangian Framework: Algorithm, Convergence, and Primal-dual Principles

    Authors: Feng-Yi Liao, Yang Zheng

    Abstract: We propose a new bundle-based augmented Lagrangian framework for solving constrained convex problems. Unlike the classical (inexact) augmented Lagrangian method (ALM) that has a nested double-loop structure, our framework features a $\textit{single-loop}$ process. Motivated by the proximal bundle method (PBM), we use a $\textit{bundle}$ of past iterates to approximate the subproblem in ALM to get… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

    Comments: 36 pages, 4 Figures

  25. arXiv:2502.07070  [pdf

    eess.SY hep-ph

    Comprehensive Analysis of Thermal Dissipation in Lithium-Ion Battery Packs

    Authors: Xuguang Zhang, Hexiang Zhang, Amjad Almansour, Mrityunjay Singh, Hengling Zhu, Michael C. Halbig, Yi Zheng

    Abstract: Effective thermal management is critical for lithium-ion battery packs' safe and efficient operations, particularly in applications such as drones, where compact designs and varying airflow conditions present unique challenges. This study investigates the thermal performance of a 16-cell lithium-ion battery pack by optimizing cooling airflow configurations and integrating phase change materials (P… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

    Comments: 20 pages, five figures, introduced the thermal management of Lithium-ion battery

  26. arXiv:2502.03825  [pdf, other

    eess.IV cs.CR cs.CV

    Synthetic Poisoning Attacks: The Impact of Poisoned MRI Image on U-Net Brain Tumor Segmentation

    Authors: Tianhao Li, Tianyu Zeng, Yujia Zheng, Chulong Zhang, Jingyu Lu, Haotian Huang, Chuangxin Chu, Fang-Fang Yin, Zhenyu Yang

    Abstract: Deep learning-based medical image segmentation models, such as U-Net, rely on high-quality annotated datasets to achieve accurate predictions. However, the increasing use of generative models for synthetic data augmentation introduces potential risks, particularly in the absence of rigorous quality control. In this paper, we investigate the impact of synthetic MRI data on the robustness and segmen… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

  27. arXiv:2502.03501  [pdf, other

    eess.IV cs.LG

    Proxy Prompt: Endowing SAM and SAM 2 with Auto-Interactive-Prompt for Medical Segmentation

    Authors: Wang Xinyi, Kang Hongyu, Wei Peishan, Shuai Li, Yu Sun, Sai Kit Lam, Yongping Zheng

    Abstract: In this paper, we aim to address the unmet demand for automated prompting and enhanced human-model interactions of SAM and SAM2 for the sake of promoting their widespread clinical adoption. Specifically, we propose Proxy Prompt (PP), auto-generated by leveraging non-target data with a pre-annotated mask. We devise a novel 3-step context-selection strategy for adaptively selecting the most represen… ▽ More

    Submitted 8 May, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

  28. arXiv:2501.18130  [pdf

    eess.SY

    Waste Animal Bone-derived Calcium Phosphate Particles with High Solar Reflectance

    Authors: Nathaniel LeCompte, Andrew Caratenuto, Yi Zheng

    Abstract: Highly reflective Calcium Phosphate (CAP) nanoparticles have been obtained from waste chicken and porcine bones. Chicken and pork bones have been processed and calcined at temperatures between 600°C and 1200°C to remove organic material and resulting in CAP bio-ceramic compounds with high reflectance. The reflectivity of the materials in the solar wavelength region is on par with chemically synthe… ▽ More

    Submitted 29 January, 2025; originally announced January 2025.

    Comments: 15 pages, 4 figures

  29. arXiv:2412.01053  [pdf, ps, other

    cs.SD eess.AS

    FreeCodec: A disentangled neural speech codec with fewer tokens

    Authors: Youqiang Zheng, Weiping Tu, Yueteng Kang, Jie Chen, Yike Zhang, Li Xiao, Yuhong Yang, Long Ma

    Abstract: Neural speech codecs have gained great attention for their outstanding reconstruction with discrete token representations. It is a crucial component in generative tasks such as speech coding and large language models (LLM). However, most works based on residual vector quantization perform worse with fewer tokens due to low coding efficiency for modeling complex coupled information. In this p… ▽ More

    Submitted 28 June, 2025; v1 submitted 1 December, 2024; originally announced December 2024.

    Comments: 5 pages, 2 figures, 3 tables.Code and Demo page:https://github.com/exercise-book-yq/FreeCodec. Accepted to Interspeech 2025

  30. Multimodal 3D Brain Tumor Segmentation with Adversarial Training and Conditional Random Field

    Authors: Lan Jiang, Yuchao Zheng, Miao Yu, Haiqing Zhang, Fatemah Aladwani, Alessandro Perelli

    Abstract: Accurate brain tumor segmentation remains a challenging task due to structural complexity and great individual differences of gliomas. Leveraging the pre-eminent detail resilience of CRF and spatial feature extraction capacity of V-net, we propose a multimodal 3D Volume Generative Adversarial Network (3D-vGAN) for precise segmentation. The model utilizes Pseudo-3D for V-net improvement, adds condi… ▽ More

    Submitted 21 November, 2024; originally announced November 2024.

    Comments: 13 pages, 7 figures, Annual Conference on Medical Image Understanding and Analysis (MIUA) 2024

    MSC Class: 15-11 ACM Class: I.4.6; I.5.4

    Journal ref: Medical Image Understanding and Analysis (MIUA), Lecture Notes in Computer Science, Springer, vol. 14859, 2024

  31. arXiv:2411.06782  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    QuadWBG: Generalizable Quadrupedal Whole-Body Grasping

    Authors: Jilong Wang, Javokhirbek Rajabov, Chaoyi Xu, Yiming Zheng, He Wang

    Abstract: Legged robots with advanced manipulation capabilities have the potential to significantly improve household duties and urban maintenance. Despite considerable progress in developing robust locomotion and precise manipulation methods, seamlessly integrating these into cohesive whole-body control for real-world applications remains challenging. In this paper, we present a modular framework for robus… ▽ More

    Submitted 13 January, 2025; v1 submitted 11 November, 2024; originally announced November 2024.

  32. arXiv:2410.22683  [pdf, ps, other

    math.OC eess.SY

    Inexact Augmented Lagrangian Methods for Conic Programs: Quadratic Growth and Linear Convergence

    Authors: Feng-Yi Liao, Lijun Ding, Yang Zheng

    Abstract: Augmented Lagrangian Methods (ALMs) are widely employed in solving constrained optimizations, and some efficient solvers are developed based on this framework. Under the quadratic growth assumption, it is known that the dual iterates and the Karush-Kuhn-Tucker (KKT) residuals of ALMs applied to semidefinite programs (SDPs) converge linearly. In contrast, the convergence rate of the primal iterates… ▽ More

    Submitted 30 October, 2024; originally announced October 2024.

    Comments: 32 pages, 5 figures

  33. arXiv:2410.05803  [pdf, ps, other

    eess.SP

    A Radio Map Approach for Reduced Pilot CSI Tracking in Massive MIMO Networks

    Authors: Yuanshuai Zheng, Junting Chen

    Abstract: Massive multiple-input multiple-output (MIMO) systems offer significant potential to enhance wireless communication performance, yet accurate and timely channel state information (CSI) acquisition remains a key challenge. Existing works on CSI estimation and radio map applications typically rely on stationary CSI statistics and accurate location labels. However, the CSI process can be discontinuou… ▽ More

    Submitted 22 June, 2025; v1 submitted 8 October, 2024; originally announced October 2024.

  34. arXiv:2410.02122  [pdf, ps, other

    cs.NI eess.SY

    Resource Allocation Based on Optimal Transport Theory in ISAC-Enabled Multi-UAV Networks

    Authors: Yufeng Zheng, Lixin Li, Wensheng Lin, Wei Liang, Qinghe Du, Zhu Han

    Abstract: This paper investigates the resource allocation optimization for cooperative communication with non-cooperative localization in integrated sensing and communications (ISAC)-enabled multi-unmanned aerial vehicle (UAV) cooperative networks. Our goal is to maximize the weighted sum of the system's average sum rate and the localization quality of service (QoS) by jointly optimizing cell association, c… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

  35. arXiv:2409.16661  [pdf, ps, other

    eess.IV

    Morphological-consistent Diffusion Network for Ultrasound Coronal Image Enhancement

    Authors: Yihao Zhou, Zixun Huang, Timothy Tin-Yan Lee, Chonglin Wu, Kelly Ka-Lee Lai, De Yang, Alec Lik-hang Hung, Jack Chun-Yiu Cheng, Tsz-Ping Lam, Yong-ping Zheng

    Abstract: Ultrasound curve angle (UCA) measurement provides a radiation-free and reliable evaluation for scoliosis based on ultrasound imaging. However, degraded image quality, especially in difficult-to-image patients, can prevent clinical experts from making confident measurements, even leading to misdiagnosis. In this paper, we propose a multi-stage image enhancement framework that models high-quality im… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

  36. arXiv:2409.16389  [pdf, ps, other

    math.OC eess.SY

    Willems' Fundamental Lemma for Nonlinear Systems with Koopman Linear Embedding

    Authors: Xu Shang, Jorge Cortés, Yang Zheng

    Abstract: Koopman operator theory and Willems' fundamental lemma both can provide (approximated) data-driven linear representation for nonlinear systems. However, choosing lifting functions for the Koopman operator is challenging, and the quality of the data-driven model from Willems' fundamental lemma has no guarantee for general nonlinear systems. In this paper, we extend Willems' fundamental lemma for a… ▽ More

    Submitted 23 November, 2024; v1 submitted 24 September, 2024; originally announced September 2024.

  37. arXiv:2409.13167  [pdf, ps, other

    eess.SP cs.AI

    Unsupervised Attention-Based Multi-Source Domain Adaptation Framework for Drift Compensation in Electronic Nose Systems

    Authors: Wenwen Zhang, Shuhao Hu, Zhengyuan Zhang, Yuanjin Zheng, Qi Jie Wang, Zhiping Lin

    Abstract: Continuous, long-term monitoring of hazardous, noxious, explosive, and flammable gases in industrial environments using electronic nose (E-nose) systems faces the significant challenge of reduced gas identification accuracy due to time-varying drift in gas sensors. To address this issue, we propose a novel unsupervised attention-based multi-source domain shared-private feature fusion adaptation (A… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

  38. arXiv:2409.09272  [pdf, other

    cs.CR cs.AI cs.MM cs.SD eess.AS

    SafeEar: Content Privacy-Preserving Audio Deepfake Detection

    Authors: Xinfeng Li, Kai Li, Yifan Zheng, Chen Yan, Xiaoyu Ji, Wenyuan Xu

    Abstract: Text-to-Speech (TTS) and Voice Conversion (VC) models have exhibited remarkable performance in generating realistic and natural audio. However, their dark side, audio deepfake poses a significant threat to both society and individuals. Existing countermeasures largely focus on determining the genuineness of speech based on complete original audio recordings, which however often contain private con… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

    Comments: Accepted by ACM CCS 2024. Please cite this paper as "Xinfeng Li, Kai Li, Yifan Zheng, Chen Yan, Xiaoyu Ji, Wenyuan Xu. SafeEar: Content Privacy-Preserving Audio Deepfake Detection. In Proceedings of ACM Conference on Computer and Communications Security (CCS), 2024."

  39. arXiv:2408.15217  [pdf, other

    eess.IV cs.AI cs.CV

    Fundus2Video: Cross-Modal Angiography Video Generation from Static Fundus Photography with Clinical Knowledge Guidance

    Authors: Weiyi Zhang, Siyu Huang, Jiancheng Yang, Ruoyu Chen, Zongyuan Ge, Yingfeng Zheng, Danli Shi, Mingguang He

    Abstract: Fundus Fluorescein Angiography (FFA) is a critical tool for assessing retinal vascular dynamics and aiding in the diagnosis of eye diseases. However, its invasive nature and less accessibility compared to Color Fundus (CF) images pose significant challenges. Current CF to FFA translation methods are limited to static generation. In this work, we pioneer dynamic FFA video generation from static CF… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: The paper has been accepted by Medical Image Computing and Computer Assisted Intervention Society (MICCAI) 2024

  40. arXiv:2408.14067  [pdf, other

    eess.SY

    Active Search for Low-altitude UAV Sensing and Communication for Users at Unknown Locations

    Authors: Yuanshuai Zheng, Junting Chen

    Abstract: This paper studies optimal unmanned aerial vehicle (UAV) placement to ensure line-of-sight (LOS) communication and sensing for a cluster of ground users possibly in deep shadow, while the UAV maintains backhaul connectivity with a base station (BS). The key challenges include unknown user locations, uncertain channel model parameters, and unavailable urban structure. Addressing these challenges, t… ▽ More

    Submitted 1 September, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

  41. arXiv:2408.06049  [pdf

    eess.IV

    Hardware Architecture Design of Model-Based Image Reconstruction Towards Palm-size Photoacoustic Tomography

    Authors: Yuwei Zheng, Zijian Gao, Yuting Shen, Jiadong Zhang, Daohuai Jiang, Fengyu Liu, Feng Gao, Fei Gao

    Abstract: Photoacoustic (PA) imaging technology combines the advantages of optical imaging and ultrasound imaging, showing great potential in biomedical applications. Many preclinical studies and clinical applications urgently require fast, high-quality, low-cost and portable imaging system. Translating advanced image reconstruction algorithms into hardware implementations is highly desired. However, existi… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: 11 pages, 13 figures

  42. arXiv:2408.05596  [pdf, other

    eess.SP

    Semantic Communications with Explicit Semantic Bases: Model, Architecture, and Open Problems

    Authors: Fengyu Wang, Yuan Zheng, Wenjun Xu, Junxiao Liang, Ping Zhang

    Abstract: The increasing demands for massive data transmission pose great challenges to communication systems. Compared to traditional communication systems that focus on the accurate reconstruction of bit sequences, semantic communications (SemComs), which aim to successfully deliver information connotation, have been regarded as the key technology for next-generation communication systems. Most current Se… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  43. arXiv:2407.20530  [pdf, other

    cs.SD eess.AS

    SuperCodec: A Neural Speech Codec with Selective Back-Projection Network

    Authors: Youqiang Zheng, Weiping Tu, Li Xiao, Xinmeng Xu

    Abstract: Neural speech coding is a rapidly developing topic, where state-of-the-art approaches now exhibit superior compression performance than conventional methods. Despite significant progress, existing methods still have limitations in preserving and reconstructing fine details for optimal reconstruction, especially at low bitrates. In this study, we introduce SuperCodec, a neural speech codec that ach… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: Accepted by ICASSP 2024

  44. arXiv:2407.19859  [pdf

    eess.SY

    ProRuka: A highly efficient HMI algorithm for controlling a novel prosthetic hand with 6-DOF using sonomyography

    Authors: Vaheh Nazari, Yong-Ping Zheng

    Abstract: Sonomyography (SMG) is a novel human-machine interface that controls upper-limb prostheses by monitoring forearm muscle activity using ultrasonic imaging. SMG has been investigated for controlling upper-limb prostheses during the last two decades. The results show that this method, in combination with artificial intelligence, can classify different hand gestures with an accuracy of more than 90%,… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  45. arXiv:2407.14894  [pdf, other

    eess.SY

    A Holistic Optimization Framework for Energy Efficient UAV-assisted Fog Computing: Attitude Control, Trajectory Planning and Task Assignment

    Authors: Shuaijun Liu, Jinqiu Du, Yaxin Zheng, Jiaying Yin, Yuhui Deng, Jingjin Wu

    Abstract: Unmanned Aerial Vehicles (UAVs) have significantly enhanced fog computing by acting as both flexible computation platforms and communication mobile relays. In this paper, we propose a holistic framework that jointly optimizes the total latency and energy consumption for UAV-assisted fog computing in a three-dimensional spatial domain with varying terrain elevations and dynamic task generations. Ou… ▽ More

    Submitted 5 August, 2024; v1 submitted 20 July, 2024; originally announced July 2024.

    Comments: 14 pages, 10 figures

  46. arXiv:2407.07052  [pdf, other

    eess.IV cs.CV

    Latent Space Imaging

    Authors: Matheus Souza, Yidan Zheng, Kaizhang Kang, Yogeshwar Nath Mishra, Qiang Fu, Wolfgang Heidrich

    Abstract: Digital imaging systems have traditionally relied on brute-force measurement and processing of pixels arranged on regular grids. In contrast, the human visual system performs significant data reduction from the large number of photoreceptors to the optic nerve, effectively encoding visual information into a low-bandwidth latent space representation optimized for brain processing. Inspired by this,… ▽ More

    Submitted 23 March, 2025; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: Accepted to CVPR 2025; see http://github.com/vccimaging/latent-imaging

  47. arXiv:2407.04675  [pdf, other

    eess.AS cs.SD

    Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

    Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  48. arXiv:2407.03026  [pdf, other

    cs.SD cs.AI eess.AS

    Qifusion-Net: Layer-adapted Stream/Non-stream Model for End-to-End Multi-Accent Speech Recognition

    Authors: Jinming Chen, Jingyi Fang, Yuanzhong Zheng, Yaoxuan Wang, Haojun Fei

    Abstract: Currently, end-to-end (E2E) speech recognition methods have achieved promising performance. However, auto speech recognition (ASR) models still face challenges in recognizing multi-accent speech accurately. We propose a layer-adapted fusion (LAF) model, called Qifusion-Net, which does not require any prior knowledge about the target accent. Based on dynamic chunk strategy, our approach enables str… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: accpeted by interspeech 2014, 5 pages, 1 figure

  49. arXiv:2407.00529  [pdf, other

    cs.LG cs.SD eess.AS math.ST stat.ML

    Detecting and Identifying Selection Structure in Sequential Data

    Authors: Yujia Zheng, Zeyu Tang, Yiwen Qiu, Bernhard Schölkopf, Kun Zhang

    Abstract: We argue that the selective inclusion of data points based on latent objectives is common in practical situations, such as music sequences. Since this selection process often distorts statistical analysis, previous work primarily views it as a bias to be corrected and proposes various methods to mitigate its effect. However, while controlling this bias is crucial, selection also offers an opportun… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: ICML 2024

  50. arXiv:2406.17784  [pdf, other

    eess.SP

    Scalable Near-Field Localization Based on Partitioned Large-Scale Antenna Array

    Authors: Xiaojun Yuan, Yuqing Zheng, Mingchen Zhang, Boyu Teng, Wenjun Jiang

    Abstract: This paper studies a passive localization system, where an extremely large-scale antenna array (ELAA) is deployed at the base station (BS) to locate a user equipment (UE) residing in its near-field (Fresnel) region. We propose a novel algorithm, named array partitioning-based location estimation (APLE), for scalable near-field localization. The APLE algorithm is developed based on the basic assump… ▽ More

    Submitted 13 May, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2312.12342