Skip to main content

Showing 1–50 of 1,057 results for author: Wang, H

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.09986  [pdf, other

    cs.CV eess.IV

    High Quality Underwater Image Compression with Adaptive Correction and Codebook-based Augmentation

    Authors: Yimin Zhou, Yichong Xia, Sicheng Pan, Bin Chen, Baoyi An, Haoqian Wang, Zhi Wang, Yaowei Wang, Zikun Zhou

    Abstract: With the increasing exploration and exploitation of the underwater world, underwater images have become a critical medium for human interaction with marine environments, driving extensive research into their efficient transmission and storage. However, contemporary underwater image compression algorithms fail to fully leverage the unique characteristics distinguishing underwater scenes from terres… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2505.09112  [pdf, ps, other

    eess.SP

    Mainlobe Jamming Suppression Using MIMO-STCA Radar

    Authors: Huake Wang, Bairui Cai, Guisheng Liao

    Abstract: Radar jamming suppression, particularly against mainlobe jamming, has become a critical focus in modern radar systems. This article investigates advanced mainlobe jamming suppression techniques utilizing a novel multiple-input multiple-output space-time coding array (MIMO-STCA) radar. Extending the capabilities of traditional MIMO radar, the MIMO-STCA framework introduces additional degrees of fre… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  3. arXiv:2505.08559  [pdf, ps, other

    math.OC eess.SY

    Synthesis of safety certificates for discrete-time uncertain systems via convex optimization

    Authors: Marta Fochesato, Han Wang, Antonis Papachristodoulou, Paul Goulart

    Abstract: We study the problem of co-designing control barrier functions and linear state feedback controllers for discrete-time linear systems affected by additive disturbances. For disturbances of bounded magnitude, we provide a semi-definite program whose feasibility implies the existence of a control law and a certificate ensuring safety in the infinite horizon with respect to the worst-case disturbance… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  4. arXiv:2505.08247  [pdf, ps, other

    eess.IV cs.CV

    Skeleton-Guided Diffusion Model for Accurate Foot X-ray Synthesis in Hallux Valgus Diagnosis

    Authors: Midi Wan, Pengfei Li, Yizhuo Liang, Di Wu, Yushan Pan, Guangzhen Zhu, Hao Wang

    Abstract: Medical image synthesis plays a crucial role in providing anatomically accurate images for diagnosis and treatment. Hallux valgus, which affects approximately 19% of the global population, requires frequent weight-bearing X-rays for assessment, placing additional strain on both patients and healthcare providers. Existing X-ray models often struggle to balance image fidelity, skeletal consistency,… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  5. arXiv:2505.06495  [pdf, other

    eess.SP

    Monopulse Parameter Estimation based on MIMO-STCA Radar in the Presence of Multiple Mainlobe Jammings

    Authors: Huake Wang, Dongchang Zhang, Guisheng Liao, Yinghui Quan

    Abstract: The monopulse technique is characterized by its high accuracy in angle estimation and simplicity in engineering implementation. However, in the complex electromagnetic environment, the presence of the mainlobe jamming (MLJ) greatly degrades the accuracy of angle estimation. Conventional methods of jamming suppression often lead to significant deviations in monopulse ratio while suppressing MLJ. Ad… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: 10 pages,15 figures

  6. arXiv:2505.06007  [pdf, other

    eess.SP

    Quantum Noise Limited Temperature-Change Estimation for Phase-OTDR Employing Coherent Detection

    Authors: Huwei Wang, Roman Ermakov, Francesco Da Ros, Darko Zibar

    Abstract: The quantum limit is a fundamental lower bound on the uncertainty when estimating a parameter in a system dominated by the minimum amount of noise (quantum noise). For the first time, we derive and demonstrate a quantum limit for temperature-change estimation for coherent phase-OTDR sensing-systems.

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: 4 pages, 4 figures

  7. arXiv:2505.05870  [pdf, ps, other

    cs.CV cs.AI eess.IV

    Towards Facial Image Compression with Consistency Preserving Diffusion Prior

    Authors: Yimin Zhou, Yichong Xia, Bin Chen, Baoyi An, Haoqian Wang, Zhi Wang, Yaowei Wang, Zikun Zhou

    Abstract: With the widespread application of facial image data across various domains, the efficient storage and transmission of facial images has garnered significant attention. However, the existing learned face image compression methods often produce unsatisfactory reconstructed image quality at low bit rates. Simply adapting diffusion-based compression methods to facial compression tasks results in reco… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  8. arXiv:2505.05796  [pdf, ps, other

    eess.SY cs.AI math.OC

    Human-in-the-Loop AI for HVAC Management Enhancing Comfort and Energy Efficiency

    Authors: Xinyu Liang, Frits de Nijs, Buser Say, Hao Wang

    Abstract: Heating, Ventilation, and Air Conditioning (HVAC) systems account for approximately 38% of building energy consumption globally, making them one of the most energy-intensive services. The increasing emphasis on energy efficiency and sustainability, combined with the need for enhanced occupant comfort, presents a significant challenge for traditional HVAC systems. These systems often fail to dynami… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: ACM e-Energy 2025

  9. arXiv:2505.05159  [pdf, other

    eess.AS

    FlexSpeech: Towards Stable, Controllable and Expressive Text-to-Speech

    Authors: Linhan Ma, Dake Guo, He Wang, Jin Xu, Lei Xie

    Abstract: Current speech generation research can be categorized into two primary classes: non-autoregressive and autoregressive. The fundamental distinction between these approaches lies in the duration prediction strategy employed for predictable-length sequences. The NAR methods ensure stability in speech generation by explicitly and independently modeling the duration of each phonetic unit. Conversely, A… ▽ More

    Submitted 15 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

    Comments: 10 pages, 5 figures

  10. arXiv:2505.04105  [pdf

    eess.IV cs.CV

    MAISY: Motion-Aware Image SYnthesis for Medical Image Motion Correction

    Authors: Andrew Zhang, Hao Wang, Shuchang Ye, Michael Fulham, Jinman Kim

    Abstract: Patient motion during medical image acquisition causes blurring, ghosting, and distorts organs, which makes image interpretation challenging. Current state-of-the-art algorithms using Generative Adversarial Network (GAN)-based methods with their ability to learn the mappings between corrupted images and their ground truth via Structural Similarity Index Measure (SSIM) loss effectively generate mot… ▽ More

    Submitted 8 May, 2025; v1 submitted 6 May, 2025; originally announced May 2025.

  11. arXiv:2505.03380  [pdf, other

    cs.CV cs.AI eess.IV

    Reinforced Correlation Between Vision and Language for Precise Medical AI Assistant

    Authors: Haonan Wang, Jiaji Mao, Lehan Wang, Qixiang Zhang, Marawan Elbatel, Yi Qin, Huijun Hu, Baoxun Li, Wenhui Deng, Weifeng Qin, Hongrui Li, Jialin Liang, Jun Shen, Xiaomeng Li

    Abstract: Medical AI assistants support doctors in disease diagnosis, medical image analysis, and report generation. However, they still face significant challenges in clinical use, including limited accuracy with multimodal content and insufficient validation in real-world settings. We propose RCMed, a full-stack AI assistant that improves multimodal alignment in both input and output, enabling precise ana… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  12. arXiv:2505.02628  [pdf, other

    eess.IV cs.CV

    DeepSparse: A Foundation Model for Sparse-View CBCT Reconstruction

    Authors: Yiqun Lin, Hualiang Wang, Jixiang Chen, Jiewen Yang, Jiarong Guo, Xiaomeng Li

    Abstract: Cone-beam computed tomography (CBCT) is a critical 3D imaging technology in the medical field, while the high radiation exposure required for high-quality imaging raises significant concerns, particularly for vulnerable populations. Sparse-view reconstruction reduces radiation by using fewer X-ray projections while maintaining image quality, yet existing methods face challenges such as high comput… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  13. arXiv:2505.01476  [pdf, other

    eess.IV cs.AI cs.CV

    CostFilter-AD: Enhancing Anomaly Detection through Matching Cost Filtering

    Authors: Zhe Zhang, Mingxiu Cai, Hanxiao Wang, Gaochang Wu, Tianyou Chai, Xiatian Zhu

    Abstract: Unsupervised anomaly detection (UAD) seeks to localize the anomaly mask of an input image with respect to normal samples. Either by reconstructing normal counterparts (reconstruction-based) or by learning an image feature embedding space (embedding-based), existing approaches fundamentally rely on image-level or feature-level matching to derive anomaly scores. Often, such a matching process is ina… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: 20 pages, 11 figures, 10 tables, accepted by Forty-Second International Conference on Machine Learning ( ICML 2025 )

  14. arXiv:2505.01212  [pdf, other

    cs.CV eess.IV

    High Dynamic Range Novel View Synthesis with Single Exposure

    Authors: Kaixuan Zhang, Hu Wang, Minxian Li, Mingwu Ren, Mao Ye, Xiatian Zhu

    Abstract: High Dynamic Range Novel View Synthesis (HDR-NVS) aims to establish a 3D scene HDR model from Low Dynamic Range (LDR) imagery. Typically, multiple-exposure LDR images are employed to capture a wider range of brightness levels in a scene, as a single LDR image cannot represent both the brightest and darkest regions simultaneously. While effective, this multiple-exposure HDR-NVS approach has signifi… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: It has been accepted by ICML 2025

  15. arXiv:2504.19091  [pdf, other

    eess.SP

    A Tutorial on MIMO-OFDM ISAC: From Far-Field to Near-Field

    Authors: Qianglong Dai, Yong Zeng, Huizhi Wang, Changsheng You, Chao Zhou, Hongqiang Cheng, Xiaoli Xu, Shi Jin, A. Lee Swindlehurst, Yonina C. Eldar, Robert Schober, Rui Zhang, Xiaohu You

    Abstract: Integrated sensing and communication (ISAC) is one of the key usage scenarios for future sixth-generation (6G) mobile communication networks, where communication and sensing (C&S) services are simultaneously provided through shared wireless spectrum, signal processing modules, hardware, and network infrastructure. Such an integration is strengthened by the technology trends in 6G, such as denser n… ▽ More

    Submitted 26 April, 2025; originally announced April 2025.

  16. arXiv:2504.17139  [pdf, other

    eess.SY

    Opt-ODENet: A Neural ODE Framework with Differentiable QP Layers for Safe and Stable Control Design (longer version)

    Authors: Keyan Miao, Liqun Zhao, Han Wang, Konstantinos Gatsis, Antonis Papachristodoulou

    Abstract: Designing controllers that achieve task objectives while ensuring safety is a key challenge in control systems. This work introduces Opt-ODENet, a Neural ODE framework with a differentiable Quadratic Programming (QP) optimization layer to enforce constraints as hard requirements. Eliminating the reliance on nominal controllers or large datasets, our framework solves the optimal control problem dir… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: 19 pages

  17. arXiv:2504.12711  [pdf, other

    cs.CV cs.AI eess.IV

    NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results

    Authors: Xin Li, Yeying Jin, Xin Jin, Zongwei Wu, Bingchen Li, Yufei Wang, Wenhan Yang, Yu Li, Zhibo Chen, Bihan Wen, Robby T. Tan, Radu Timofte, Qiyu Rong, Hongyuan Jing, Mengmeng Zhang, Jinglong Li, Xiangyu Lu, Yi Ren, Yuting Liu, Meng Zhang, Xiang Chen, Qiyuan Guan, Jiangxin Dong, Jinshan Pan, Conglin Gou , et al. (112 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images. This challenge received a wide range of impressive solutions, which are developed and evaluated using our collected real-world Raindrop Clarity dataset. Unlike existing deraining datasets, our Raindrop Clarity dataset is more diverse and challenging in degradation types and contents, which includ… ▽ More

    Submitted 19 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

    Comments: Challenge Report of CVPR NTIRE 2025; 26 pages; Methods from 32 teams

  18. arXiv:2504.10352  [pdf, other

    eess.AS cs.CL

    Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis

    Authors: Yifan Yang, Shujie Liu, Jinyu Li, Yuxuan Hu, Haibin Wu, Hui Wang, Jianwei Yu, Lingwei Meng, Haiyang Sun, Yanqing Liu, Yan Lu, Kai Yu, Xie Chen

    Abstract: Recent zero-shot text-to-speech (TTS) systems face a common dilemma: autoregressive (AR) models suffer from slow generation and lack duration controllability, while non-autoregressive (NAR) models lack temporal modeling and typically require complex designs. In this paper, we introduce a novel pseudo-autoregressive (PAR) codec language modeling approach that unifies AR and NAR modeling. Combining… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Submitted to ACM MM 2025

  19. arXiv:2504.10137  [pdf, other

    cs.IT eess.SP

    Multi-Target Position Error Bound and Power Allocation Scheme for Cell-Free mMIMO-OTFS ISAC Systems

    Authors: Yifei Fan, Shaochuan Wu, Haojie Wang, Mingjun Sun, Jianhe Wang

    Abstract: This paper investigates multi-target position estimation in cell-free massive multiple-input multiple-output (CF mMIMO) architectures, where orthogonal time frequency and space (OTFS) is used as an integrated sensing and communication (ISAC) signal. Closed-form expressions for the Cramér-Rao lower bound and the positioning error bound (PEB) in multi-target position estimation are derived, providin… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: This work is submitted to IEEE for possible publication

  20. arXiv:2504.07498  [pdf, other

    eess.SP

    Learning Joint Source-Channel Encoding in IRS-assisted Multi-User Semantic Communications

    Authors: Haidong Wang, Songhan Zhao, Lanhua Li, Bo Gu, Jing Xu, Shimin Gong, Jiawen Kang

    Abstract: In this paper, we investigate a joint source-channel encoding (JSCE) scheme in an intelligent reflecting surface (IRS)-assisted multi-user semantic communication system. Semantic encoding not only compresses redundant information, but also enhances information orthogonality in a semantic feature space. Meanwhile, the IRS can adjust the spatial orthogonality, enabling concurrent multi-user semantic… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

  21. arXiv:2504.06173  [pdf, other

    cs.NI cs.AI cs.ET cs.LG eess.SP

    Multi-Modality Sensing in mmWave Beamforming for Connected Vehicles Using Deep Learning

    Authors: Muhammad Baqer Mollah, Honggang Wang, Mohammad Ataul Karim, Hua Fang

    Abstract: Beamforming techniques are considered as essential parts to compensate severe path losses in millimeter-wave (mmWave) communications. In particular, these techniques adopt large antenna arrays and formulate narrow beams to obtain satisfactory received powers. However, performing accurate beam alignment over narrow beams for efficient link configuration by traditional standard defined beam selectio… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

    Comments: 15 Pages

    Journal ref: IEEE Transactions on Cognitive Communications and Networking, 2025

  22. arXiv:2504.04533  [pdf, other

    eess.SY

    Confidence-Aware Learning Optimal Terminal Guidance via Gaussian Process Regression

    Authors: Han Wang, Donghe Chen, Tengjie Zheng, Lin Cheng, Shengping Gong

    Abstract: Modern aerospace guidance systems demand rigorous constraint satisfaction, optimal performance, and computational efficiency. Traditional analytical methods struggle to simultaneously satisfy these requirements. While data driven methods have shown promise in learning optimal guidance strategy, challenges still persist in generating well-distributed optimal dataset and ensuring the reliability and… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

  23. arXiv:2504.02880  [pdf

    eess.IV cs.AI cs.CV

    Global Rice Multi-Class Segmentation Dataset (RiceSEG): A Comprehensive and Diverse High-Resolution RGB-Annotated Images for the Development and Benchmarking of Rice Segmentation Algorithms

    Authors: Junchi Zhou, Haozhou Wang, Yoichiro Kato, Tejasri Nampally, P. Rajalakshmi, M. Balram, Keisuke Katsura, Hao Lu, Yue Mu, Wanneng Yang, Yangmingrui Gao, Feng Xiao, Hongtao Chen, Yuhao Chen, Wenjuan Li, Jingwen Wang, Fenghua Yu, Jian Zhou, Wensheng Wang, Xiaochun Hu, Yuanzhu Yang, Yanfeng Ding, Wei Guo, Shouyang Liu

    Abstract: Developing computer vision-based rice phenotyping techniques is crucial for precision field management and accelerating breeding, thereby continuously advancing rice production. Among phenotyping tasks, distinguishing image components is a key prerequisite for characterizing plant growth and development at the organ scale, enabling deeper insights into eco-physiological processes. However, due to… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  24. Brightness Perceiving for Recursive Low-Light Image Enhancement

    Authors: Haodian Wang, Long Peng, Yuejin Sun, Zengyu Wan, Yang Wang, Yang Cao

    Abstract: Due to the wide dynamic range in real low-light scenes, there will be large differences in the degree of contrast degradation and detail blurring of captured images, making it difficult for existing end-to-end methods to enhance low-light images to normal exposure. To address the above issue, we decompose low-light image enhancement into a recursive enhancement task and propose a brightness-percei… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Journal ref: IEEE Transactions on Artificial Intelligence Vol 5, no. 6, 3034--3045 (2023)

  25. arXiv:2504.01806  [pdf, other

    eess.SY cs.RO

    Quattro: Transformer-Accelerated Iterative Linear Quadratic Regulator Framework for Fast Trajectory Optimization

    Authors: Yue Wang, Haoyu Wang, Zhaoxing Li

    Abstract: Real-time optimal control remains a fundamental challenge in robotics, especially for nonlinear systems with stringent performance requirements. As one of the representative trajectory optimization algorithms, the iterative Linear Quadratic Regulator (iLQR) faces limitations due to their inherently sequential computational nature, which restricts the efficiency and applicability of real-time contr… ▽ More

    Submitted 3 April, 2025; v1 submitted 2 April, 2025; originally announced April 2025.

  26. arXiv:2503.23446  [pdf, other

    cs.NI cs.IT eess.SP

    Semantic Communication for the Internet of Space: New Architecture, Challenges, and Future Vision

    Authors: Hanlin Cai, Houtianfu Wang, Haofan Dong, Ozgur B. Akan

    Abstract: The expansion of sixth-generation (6G) wireless networks into space introduces technical challenges that conventional bit-oriented communication approaches cannot efficiently address, including intermittent connectivity, severe latency, limited bandwidth, and constrained onboard resources. To overcome these limitations, semantic communication has emerged as a transformative paradigm, shifting the… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.

    Comments: 9 pages, 6 figures

  27. arXiv:2503.22200  [pdf, other

    cs.SD cs.CV eess.AS

    Enhance Generation Quality of Flow Matching V2A Model via Multi-Step CoT-Like Guidance and Combined Preference Optimization

    Authors: Haomin Zhang, Sizhe Shan, Haoyu Wang, Zihao Chen, Xiulong Liu, Chaofan Ding, Xinhan Di

    Abstract: Creating high-quality sound effects from videos and text prompts requires precise alignment between visual and audio domains, both semantically and temporally, along with step-by-step guidance for professional audio generation. However, current state-of-the-art video-guided audio generation models often fall short of producing high-quality audio for both general and specialized use cases. To addre… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: 10 pages, 4 figures

  28. arXiv:2503.21818  [pdf

    eess.IV cs.CV

    Deep Learning-Based Quantitative Assessment of Renal Chronicity Indices in Lupus Nephritis

    Authors: Tianqi Tu, Hui Wang, Jiangbo Pei, Xiaojuan Yu, Aidong Men, Suxia Wang, Qingchao Chen, Ying Tan, Feng Yu, Minghui Zhao

    Abstract: Background: Renal chronicity indices (CI) have been identified as strong predictors of long-term outcomes in lupus nephritis (LN) patients. However, assessment by pathologists is hindered by challenges such as substantial time requirements, high interobserver variation, and susceptibility to fatigue. This study aims to develop an effective deep learning (DL) pipeline that automates the assessment… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  29. arXiv:2503.19591  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    Boosting the Transferability of Audio Adversarial Examples with Acoustic Representation Optimization

    Authors: Weifei Jin, Junjie Su, Hejia Wang, Yulin Ye, Jie Hao

    Abstract: With the widespread application of automatic speech recognition (ASR) systems, their vulnerability to adversarial attacks has been extensively studied. However, most existing adversarial examples are generated on specific individual models, resulting in a lack of transferability. In real-world scenarios, attackers often cannot access detailed information about the target model, making query-based… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: Accepted to ICME 2025

  30. arXiv:2503.18521  [pdf, other

    eess.SY math.OC

    Constraint Horizon in Model Predictive Control

    Authors: Allan Andre Do Nascimento, Han Wang, Antonis Papachristodoulou, Kostas Margellos

    Abstract: In this work, we propose a Model Predictive Control (MPC) formulation incorporating two distinct horizons: a prediction horizon and a constraint horizon. This approach enables a deeper understanding of how constraints influence key system properties such as suboptimality, without compromising recursive feasibility and constraint satisfaction. In this direction, our contributions are twofold. First… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: submitted to L-CSS

  31. arXiv:2503.16635  [pdf, other

    eess.IV cs.CV

    Fed-NDIF: A Noise-Embedded Federated Diffusion Model For Low-Count Whole-Body PET Denoising

    Authors: Yinchi Zhou, Huidong Xie, Menghua Xia, Qiong Liu, Bo Zhou, Tianqi Chen, Jun Hou, Liang Guo, Xinyuan Zheng, Hanzhong Wang, Biao Li, Axel Rominger, Kuangyu Shi, Nicha C. Dvorneka, Chi Liu

    Abstract: Low-count positron emission tomography (LCPET) imaging can reduce patients' exposure to radiation but often suffers from increased image noise and reduced lesion detectability, necessitating effective denoising techniques. Diffusion models have shown promise in LCPET denoising for recovering degraded image quality. However, training such models requires large and diverse datasets, which are challe… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  32. arXiv:2503.16578  [pdf, other

    cs.CL cs.SD eess.AS

    SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors

    Authors: Yang Chen, Hui Wang, Shiyao Wang, Junyang Chen, Jiabei He, Jiaming Zhou, Xi Yang, Yequan Wang, Yonghua Lin, Yong Qin

    Abstract: While voice technologies increasingly serve aging populations, current systems exhibit significant performance gaps due to inadequate training data capturing elderly-specific vocal characteristics like presbyphonia and dialectal variations. The limited data available on super-aged individuals in existing elderly speech datasets, coupled with overly simple recording styles and annotation dimensions… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  33. arXiv:2503.16055  [pdf, other

    eess.IV cs.CV

    SALT: Singular Value Adaptation with Low-Rank Transformation

    Authors: Abdelrahman Elsayed, Sarim Hashmi, Mohammed Elseiagy, Hu Wang, Mohammad Yaqub, Ibrahim Almakky

    Abstract: The complex nature of medical image segmentation calls for models that are specifically designed to capture detailed, domain-specific features. Large foundation models offer considerable flexibility, yet the cost of fine-tuning these models remains a significant barrier. Parameter-Efficient Fine-Tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), efficiently update model weights with low-ra… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  34. arXiv:2503.14535  [pdf, other

    cs.CV cs.AI eess.IV

    Interpretable Unsupervised Joint Denoising and Enhancement for Real-World low-light Scenarios

    Authors: Huaqiu Li, Xiaowan Hu, Haoqian Wang

    Abstract: Real-world low-light images often suffer from complex degradations such as local overexposure, low brightness, noise, and uneven illumination. Supervised methods tend to overfit to specific scenarios, while unsupervised methods, though better at generalization, struggle to model these degradations due to the lack of reference images. To address this issue, we propose an interpretable, zero-referen… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  35. arXiv:2503.13479  [pdf, other

    eess.SP

    EAGLE: Contextual Point Cloud Generation via Adaptive Continuous Normalizing Flow with Self-Attention

    Authors: Linhao Wang, Qichang Zhang, Yifan Yang, Hao Wang

    Abstract: As 3D point clouds become the prevailing shape representation in computer vision, how to generate high-resolution point clouds has become a pressing issue. Flow-based generative models can effectively perform point cloud generation tasks. However, traditional CNN-based flow architectures rely only on local information to extract features, making it difficult to capture global contextual informatio… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

  36. arXiv:2503.03971  [pdf, other

    eess.IV

    Towards Universal Learning-based Model for Cardiac Image Reconstruction: Summary of the CMRxRecon2024 Challenge

    Authors: Fanwen Wang, Zi Wang, Yan Li, Jun Lyu, Chen Qin, Shuo Wang, Kunyuan Guo, Mengting Sun, Mingkai Huang, Haoyu Zhang, Michael Tänzer, Qirong Li, Xinran Chen, Jiahao Huang, Yinzhe Wu, Kian Anvari Hamedani, Yuntong Lyu, Longyu Sun, Qing Li, Ziqiang Xu, Bingyu Xin, Dimitris N. Metaxas, Narges Razizadeh, Shahabedin Nabavi, George Yiasemis , et al. (34 additional authors not shown)

    Abstract: Cardiovascular magnetic resonance (CMR) imaging offers diverse contrasts for non-invasive assessment of cardiac function and myocardial characterization. However, CMR often requires the acquisition of many contrasts, and each contrast takes a considerable amount of time. The extended acquisition time will further increase the susceptibility to motion artifacts. Existing deep learning-based reconst… ▽ More

    Submitted 13 March, 2025; v1 submitted 5 March, 2025; originally announced March 2025.

  37. arXiv:2503.02064  [pdf, other

    eess.IV cs.CV

    CrossFusion: A Multi-Scale Cross-Attention Convolutional Fusion Model for Cancer Survival Prediction

    Authors: Rustin Soraki, Huayu Wang, Joann G. Elmore, Linda Shapiro

    Abstract: Cancer survival prediction from whole slide images (WSIs) is a challenging task in computational pathology due to the large size, irregular shape, and high granularity of the WSIs. These characteristics make it difficult to capture the full spectrum of patterns, from subtle cellular abnormalities to complex tissue interactions, which are crucial for accurate prognosis. To address this, we propose… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  38. arXiv:2503.01549  [pdf, other

    eess.SY

    Patterning Silver Nanowire Network via the Gibbs-Thomson Effect

    Authors: Hongteng Wang, Haichuan Li, Yijia Xin, Weizhen Chen, Haogen Liu, Ying Chen, Yaofei Chen, Lei Chen, Yunhan Luo, Zhe Chen, Gui-Shi Liu

    Abstract: As transparent electrodes, patterned silver nanowire (AgNW) networks suffer from noticeable pattern visibility, which is an unsettled issue for practical applications such as display. Here, we introduce a Gibbs-Thomson effect (GTE)-based patterning method to effectively reduce pattern visibility. Unlike conventional top-down and bottom-up strategies that rely on selective etching, removal, or depo… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  39. arXiv:2503.00084  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation

    Authors: Chong Zhang, Yukun Ma, Qian Chen, Wen Wang, Shengkui Zhao, Zexu Pan, Hao Wang, Chongjia Ni, Trung Hieu Nguyen, Kun Zhou, Yidi Jiang, Chaohong Tan, Zhifu Gao, Zhihao Du, Bin Ma

    Abstract: We introduce InspireMusic, a framework integrated super resolution and large language model for high-fidelity long-form music generation. A unified framework generates high-fidelity music, songs, and audio, which incorporates an autoregressive transformer with a super-resolution flow-matching model. This framework enables the controllable generation of high-fidelity long-form music at a higher sam… ▽ More

    Submitted 28 February, 2025; originally announced March 2025.

    Comments: Work in progress. Correspondence regarding this technical report should be directed to {chong.zhang, yukun.ma}@alibaba-inc.com. Online demo available on https://modelscope.cn/studios/iic/InspireMusic and https://huggingface.co/spaces/FunAudioLLM/InspireMusic

  40. arXiv:2502.18981  [pdf

    eess.SY eess.SP

    Polarization Angle Scanning for Wide-band Millimeter-wave Direct Detection

    Authors: Heyao Wang, Ziran Zhao, Lingbo Qiao, Dalu Guo

    Abstract: Millimeter-wave (MMW) technology has been widely utilized in human security screening applications due to its superior penetration capabilities through clothing and safety for human exposure. However, existing methods largely rely on fixed polarization modes, neglecting the potential insights from variations in target echoes with respect to incident polarization. This study provides a theoretical… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

  41. arXiv:2502.18913  [pdf, other

    cs.CL cs.SD eess.AS

    CS-Dialogue: A 104-Hour Dataset of Spontaneous Mandarin-English Code-Switching Dialogues for Speech Recognition

    Authors: Jiaming Zhou, Yujie Guo, Shiwan Zhao, Haoqin Sun, Hui Wang, Jiabei He, Aobo Kong, Shiyao Wang, Xi Yang, Yequan Wang, Yonghua Lin, Yong Qin

    Abstract: Code-switching (CS), the alternation between two or more languages within a single conversation, presents significant challenges for automatic speech recognition (ASR) systems. Existing Mandarin-English code-switching datasets often suffer from limitations in size, spontaneity, and the lack of full-length dialogue recordings with transcriptions, hindering the development of robust ASR models for r… ▽ More

    Submitted 11 March, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

  42. Transfer Learning Assisted Fast Design Migration Over Technology Nodes: A Study on Transformer Matching Network

    Authors: Chenhao Chu, Yuhao Mao, Hua Wang

    Abstract: In this study, we introduce an innovative methodology for the design of mm-Wave passive networks that leverages knowledge transfer from a pre-trained synthesis neural network (NN) model in one technology node and achieves swift and reliable design adaptation across different integrated circuit (IC) technologies, operating frequencies, and metal options. We prove this concept through simulation-bas… ▽ More

    Submitted 11 March, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

    Comments: Publihsed and Presented at IEEE MTT-S International Microwave Symposium (IMS 2024), Washington, DC, USA

  43. arXiv:2502.17499  [pdf

    eess.SP cs.AI cs.LG math.NA

    Detecting Long QT Syndrome and First-Degree Atrioventricular Block using Single-Lead AI-ECG: A Multi-Center Real-World Study

    Authors: Sumei Fan, Deyun Zhang, Yue Wang, Shijia Geng, Kun Lu, Meng Sang, Weilun Xu, Haixue Wang, Qinghao Zhao, Chuandong Cheng, Peng Wang, Shenda Hong

    Abstract: Home-based single-lead AI-ECG devices have enabled continuous, real-world cardiac monitoring. However, the accuracy of parameter calculations from single-lead AI-ECG algorithm remains to be fully validated, which is critical for conditions such as Long QT Syndrome (LQTS) and First-Degree Atrioventricular Block (AVBI). In this multicenter study, we assessed FeatureDB, an ECG measurements computatio… ▽ More

    Submitted 26 April, 2025; v1 submitted 21 February, 2025; originally announced February 2025.

    Comments: 29pages, 11 figures, 8 tables

  44. arXiv:2502.16142  [pdf, ps, other

    cs.CL eess.AS

    Understanding Zero-shot Rare Word Recognition Improvements Through LLM Integration

    Authors: Haoxuan Wang

    Abstract: In this study, we investigate the integration of a large language model (LLM) with an automatic speech recognition (ASR) system, specifically focusing on enhancing rare word recognition performance. Using a 190,000-hour dataset primarily sourced from YouTube, pre-processed with Whisper V3 pseudo-labeling, we demonstrate that the LLM-ASR architecture outperforms traditional Zipformer-Transducer mod… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

  45. arXiv:2502.15777  [pdf, other

    eess.SY cs.AI

    TSS GAZ PTP: Towards Improving Gumbel AlphaZero with Two-stage Self-play for Multi-constrained Electric Vehicle Routing Problems

    Authors: Hui Wang, Xufeng Zhang, Xiaoyu Zhang, Zhenhuan Ding, Chaoxu Mu

    Abstract: Recently, Gumbel AlphaZero~(GAZ) was proposed to solve classic combinatorial optimization problems such as TSP and JSSP by creating a carefully designed competition model~(consisting of a learning player and a competitor player), which leverages the idea of self-play. However, if the competitor is too strong or too weak, the effectiveness of self-play training can be reduced, particularly in compl… ▽ More

    Submitted 16 February, 2025; originally announced February 2025.

    Comments: 11 pages,9 figures

  46. arXiv:2502.14727  [pdf, other

    cs.SD cs.AI eess.AS

    WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models

    Authors: Yifu Chen, Shengpeng Ji, Haoxiao Wang, Ziqing Wang, Siyu Chen, Jinzheng He, Jin Xu, Zhou Zhao

    Abstract: Retrieval Augmented Generation (RAG) has gained widespread adoption owing to its capacity to empower large language models (LLMs) to integrate external knowledge. However, existing RAG frameworks are primarily designed for text-based LLMs and rely on Automatic Speech Recognition to process speech input, which discards crucial audio information, risks transcription errors, and increases computation… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

  47. arXiv:2502.14584  [pdf, other

    eess.IV cs.CV

    Vision Foundation Models in Medical Image Analysis: Advances and Challenges

    Authors: Pengchen Liang, Bin Pu, Haishan Huang, Yiwei Li, Hualiang Wang, Weibo Ma, Qing Chang

    Abstract: The rapid development of Vision Foundation Models (VFMs), particularly Vision Transformers (ViT) and Segment Anything Model (SAM), has sparked significant advances in the field of medical image analysis. These models have demonstrated exceptional capabilities in capturing long-range dependencies and achieving high generalization in segmentation tasks. However, adapting these large models to medica… ▽ More

    Submitted 20 February, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

    Comments: 17 pages, 1 figure

  48. arXiv:2502.13192  [pdf, other

    eess.IV

    SpeHeatal: A Cluster-Enhanced Segmentation Method for Sperm Morphology Analysis

    Authors: Yi Shi, Yunkai Wang, Xupeng Tian, Tieyi Zhang, Bing Yao, Hui Wang, Yong Shao, Cencen Wang, Rong Zeng

    Abstract: The accurate assessment of sperm morphology is crucial in andrological diagnostics, where the segmentation of sperm images presents significant challenges. Existing approaches frequently rely on large annotated datasets and often struggle with the segmentation of overlapping sperm and the presence of dye impurities. To address these challenges, this paper first analyzes the issue of overlapping sp… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

    Comments: AAAI2025

  49. arXiv:2502.12735  [pdf, other

    eess.IV eess.SP

    Task-Oriented Semantic Communication for Stereo-Vision 3D Object Detection

    Authors: Zijian Cao, Hua Zhang, Le Liang, Haotian Wang, Shi Jin, Geoffrey Ye Li

    Abstract: With the development of computer vision, 3D object detection has become increasingly important in many real-world applications. Limited by the computing power of sensor-side hardware, the detection task is sometimes deployed on remote computing devices or the cloud to execute complex algorithms, which brings massive data transmission overhead. In response, this paper proposes an optical flow-drive… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

  50. arXiv:2502.11946  [pdf, other

    cs.CL cs.AI cs.HC cs.SD eess.AS

    Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

    Authors: Ailin Huang, Boyong Wu, Bruce Wang, Chao Yan, Chen Hu, Chengli Feng, Fei Tian, Feiyu Shen, Jingbei Li, Mingrui Chen, Peng Liu, Ruihang Miao, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Gong, Zixin Zhang, Hongyu Zhou, Jianjian Sun, Brian Li, Chengting Feng, Changyi Wan, Hanpeng Hu , et al. (120 additional authors not shown)

    Abstract: Real-time speech interaction, serving as a fundamental interface for human-machine collaboration, holds immense potential. However, current open-source models face limitations such as high costs in voice data collection, weakness in dynamic control, and limited intelligence. To address these challenges, this paper introduces Step-Audio, the first production-ready open-source solution. Key contribu… ▽ More

    Submitted 18 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.