Skip to main content

Showing 1–50 of 149 results for author: Xu, K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.06020  [pdf

    eess.SP cs.NE

    A Differential Evolution Algorithm with Neighbor-hood Mutation for DOA Estimation

    Authors: Bo Zhou, Kaijie Xu, Yinghui Quan, Mengdao Xing

    Abstract: Two-dimensional (2D) Multiple Signal Classification algorithm is a powerful technique for high-resolution direction-of-arrival (DOA) estimation in array signal processing. However, the exhaustive search over the 2D an-gular domain leads to high computa-tional cost, limiting its applicability in real-time scenarios. In this work, we reformulate the peak-finding process as a multimodal optimization… ▽ More

    Submitted 8 July, 2025; originally announced July 2025.

  2. arXiv:2506.19476  [pdf, ps, other

    eess.SP

    Neural Collapse based Deep Supervised Federated Learning for Signal Detection in OFDM Systems

    Authors: Kaidi Xu, Shenglong Zhou, Geoffrey Ye Li

    Abstract: Future wireless networks are expected to be AI-empowered, making their performance highly dependent on the quality of training datasets. However, physical-layer entities often observe only partial wireless environments characterized by different power delay profiles. Federated learning is capable of addressing this limited observability, but often struggles with data heterogeneity. To tackle this… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  3. arXiv:2506.19455  [pdf, ps, other

    eess.IV cs.CV

    Angio-Diff: Learning a Self-Supervised Adversarial Diffusion Model for Angiographic Geometry Generation

    Authors: Zhifeng Wang, Renjiao Yi, Xin Wen, Chenyang Zhu, Kai Xu, Kunlun He

    Abstract: Vascular diseases pose a significant threat to human health, with X-ray angiography established as the gold standard for diagnosis, allowing for detailed observation of blood vessels. However, angiographic X-rays expose personnel and patients to higher radiation levels than non-angiographic X-rays, which are unwanted. Thus, modality translation from non-angiographic to angiographic X-rays is desir… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  4. arXiv:2505.22568  [pdf

    eess.IV cs.CV

    Multipath cycleGAN for harmonization of paired and unpaired low-dose lung computed tomography reconstruction kernels

    Authors: Aravind R. Krishnan, Thomas Z. Li, Lucas W. Remedios, Michael E. Kim, Chenyu Gao, Gaurav Rudravaram, Elyssa M. McMaster, Adam M. Saunders, Shunxing Bao, Kaiwen Xu, Lianrui Zuo, Kim L. Sandler, Fabien Maldonado, Yuankai Huo, Bennett A. Landman

    Abstract: Reconstruction kernels in computed tomography (CT) affect spatial resolution and noise characteristics, introducing systematic variability in quantitative imaging measurements such as emphysema quantification. Choosing an appropriate kernel is therefore essential for consistent quantitative analysis. We propose a multipath cycleGAN model for CT kernel harmonization, trained on a mixture of paired… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  5. arXiv:2504.14641  [pdf, ps, other

    cs.SE eess.SY

    HLSTester: Efficient Testing of Behavioral Discrepancies with LLMs for High-Level Synthesis

    Authors: Kangwei Xu, Bing Li, Grace Li Zhang, Ulf Schlichtmann

    Abstract: In high-level synthesis (HLS), C/C++ programs with synthesis directives are used to generate circuits for FPGA implementations. However, hardware-specific and platform-dependent characteristics in these implementations can introduce behavioral discrepancies between the original C/C++ programs and the circuits after high-level synthesis. Existing methods for testing behavioral discrepancies in HLS… ▽ More

    Submitted 9 July, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

    Comments: arXiv admin note: text overlap with arXiv:2407.03889

  6. arXiv:2504.13394  [pdf

    eess.SP

    A Data-centric Supervised Transfer Learning Framework for DOA Estimation with Array Imperfections

    Authors: Bo Zhou, Kaijie Xu, Yinghui Quan, Mengdao Xing

    Abstract: In practical scenarios, processes such as sensor design, manufacturing, and installation will introduce certain errors. Furthermore, mutual interference occurs when the sensors receive signals. These defects in array systems are referred to as array imperfections, which can significantly degrade the performance of Direction of Arrival (DOA) estimation. In this study, we propose a deep-learning bas… ▽ More

    Submitted 7 July, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

  7. arXiv:2503.12758  [pdf, other

    cs.CV eess.IV

    VasTSD: Learning 3D Vascular Tree-state Space Diffusion Model for Angiography Synthesis

    Authors: Zhifeng Wang, Renjiao Yi, Xin Wen, Chenyang Zhu, Kai Xu

    Abstract: Angiography imaging is a medical imaging technique that enhances the visibility of blood vessels within the body by using contrast agents. Angiographic images can effectively assist in the diagnosis of vascular diseases. However, contrast agents may bring extra radiation exposure which is harmful to patients with health risks. To mitigate these concerns, in this paper, we aim to automatically gene… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  8. arXiv:2502.08191  [pdf, other

    cs.SD eess.AS

    DualStream Contextual Fusion Network: Efficient Target Speaker Extraction by Leveraging Mixture and Enrollment Interactions

    Authors: Ke Xue, Rongfei Fan, Shanping Yu, Chang Sun, Jianping An

    Abstract: Target speaker extraction focuses on extracting a target speech signal from an environment with multiple speakers by leveraging an enrollment. Existing methods predominantly rely on speaker embeddings obtained from the enrollment, potentially disregarding the contextual information and the internal interactions between the mixture and enrollment. In this paper, we propose a novel DualStream Contex… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

  9. arXiv:2502.05119  [pdf

    eess.IV cs.CV

    Investigating the impact of kernel harmonization and deformable registration on inspiratory and expiratory chest CT images for people with COPD

    Authors: Aravind R. Krishnan, Yihao Liu, Kaiwen Xu, Michael E. Kim, Lucas W. Remedios, Gaurav Rudravaram, Adam M. Saunders, Bradley W. Richmond, Kim L. Sandler, Fabien Maldonado, Bennett A. Landman, Lianrui Zuo

    Abstract: Paired inspiratory-expiratory CT scans enable the quantification of gas trapping due to small airway disease and emphysema by analyzing lung tissue motion in COPD patients. Deformable image registration of these scans assesses regional lung volumetric changes. However, variations in reconstruction kernels between paired scans introduce errors in quantitative analysis. This work proposes a two-stag… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

    Comments: Accepted at SPIE Medical Imaging 2025, Clinical and Biomedical Imaging

  10. arXiv:2501.18834  [pdf

    eess.IV cs.AI cs.CV

    Pitfalls of defacing whole-head MRI: re-identification risk with diffusion models and compromised research potential

    Authors: Chenyu Gao, Kaiwen Xu, Michael E. Kim, Lianrui Zuo, Zhiyuan Li, Derek B. Archer, Timothy J. Hohman, Ann Zenobia Moore, Luigi Ferrucci, Lori L. Beason-Held, Susan M. Resnick, Christos Davatzikos, Jerry L. Prince, Bennett A. Landman

    Abstract: Defacing is often applied to head magnetic resonance image (MRI) datasets prior to public release to address privacy concerns. The alteration of facial and nearby voxels has provoked discussions about the true capability of these techniques to ensure privacy as well as their impact on downstream tasks. With advancements in deep generative models, the extent to which defacing can protect privacy is… ▽ More

    Submitted 30 January, 2025; originally announced January 2025.

  11. arXiv:2501.15177  [pdf, other

    cs.SD cs.MM eess.AS

    Audio-Language Models for Audio-Centric Tasks: A survey

    Authors: Yi Su, Jisheng Bai, Qisheng Xu, Kele Xu, Yong Dou

    Abstract: Audio-Language Models (ALMs), which are trained on audio-text data, focus on the processing, understanding, and reasoning of sounds. Unlike traditional supervised learning approaches learning from predefined labels, ALMs utilize natural language as a supervision signal, which is more suitable for describing complex real-world audio recordings. ALMs demonstrate strong zero-shot capabilities and can… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

  12. arXiv:2501.14350  [pdf, other

    eess.AS cs.SD

    FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration

    Authors: Kai-Tuo Xu, Feng-Long Xie, Xu Tang, Yao Hu

    Abstract: We present FireRedASR, a family of large-scale automatic speech recognition (ASR) models for Mandarin, designed to meet diverse requirements in superior performance and optimal efficiency across various applications. FireRedASR comprises two variants: FireRedASR-LLM: Designed to achieve state-of-the-art (SOTA) performance and to enable seamless end-to-end speech interaction. It adopts an Encoder… ▽ More

    Submitted 24 January, 2025; originally announced January 2025.

  13. arXiv:2501.13071  [pdf

    cs.CV eess.IV

    Robust Body Composition Analysis by Generating 3D CT Volumes from Limited 2D Slices

    Authors: Lianrui Zuo, Xin Yu, Dingjie Su, Kaiwen Xu, Aravind R. Krishnan, Yihao Liu, Shunxing Bao, Fabien Maldonado, Luigi Ferrucci, Bennett A. Landman

    Abstract: Body composition analysis provides valuable insights into aging, disease progression, and overall health conditions. Due to concerns of radiation exposure, two-dimensional (2D) single-slice computed tomography (CT) imaging has been used repeatedly for body composition analysis. However, this approach introduces significant spatial variability that can impact the accuracy and robustness of the anal… ▽ More

    Submitted 22 January, 2025; originally announced January 2025.

  14. arXiv:2501.13068  [pdf

    cs.CV eess.IV

    Beyond the Lungs: Extending the Field of View in Chest CT with Latent Diffusion Models

    Authors: Lianrui Zuo, Kaiwen Xu, Dingjie Su, Xin Yu, Aravind R. Krishnan, Yihao Liu, Shunxing Bao, Thomas Li, Kim L. Sandler, Fabien Maldonado, Bennett A. Landman

    Abstract: The interconnection between the human lungs and other organs, such as the liver and kidneys, is crucial for understanding the underlying risks and effects of lung diseases and improving patient care. However, most research chest CT imaging is focused solely on the lungs due to considerations of cost and radiation dose. This restricted field of view (FOV) in the acquired images poses challenges to… ▽ More

    Submitted 22 January, 2025; originally announced January 2025.

  15. arXiv:2501.08518  [pdf, other

    cs.HC cs.AI eess.SP q-bio.QM

    Easing Seasickness through Attention Redirection with a Mindfulness-Based Brain--Computer Interface

    Authors: Xiaoyu Bao, Kailin Xu, Jiawei Zhu, Haiyun Huang, Kangning Li, Qiyun Huang, Yuanqing Li

    Abstract: Seasickness is a prevalent issue that adversely impacts both passenger experiences and the operational efficiency of maritime crews. While techniques that redirect attention have proven effective in alleviating motion sickness symptoms in terrestrial environments, applying similar strategies to manage seasickness poses unique challenges due to the prolonged and intense motion environment associate… ▽ More

    Submitted 14 January, 2025; originally announced January 2025.

  16. arXiv:2412.11907  [pdf, other

    cs.SD eess.AS

    AudioCIL: A Python Toolbox for Audio Class-Incremental Learning with Multiple Scenes

    Authors: Qisheng Xu, Yulin Sun, Yi Su, Qian Zhu, Xiaoyi Tan, Hongyu Wen, Zijian Gao, Kele Xu, Yong Dou, Dawei Feng

    Abstract: Deep learning, with its robust aotomatic feature extraction capabilities, has demonstrated significant success in audio signal processing. Typically, these methods rely on static, pre-collected large-scale datasets for training, performing well on a fixed number of classes. However, the real world is characterized by constant change, with new audio classes emerging from streaming or temporary avai… ▽ More

    Submitted 18 December, 2024; v1 submitted 16 December, 2024; originally announced December 2024.

  17. arXiv:2412.05167  [pdf, other

    cs.AI cs.CL cs.SD eess.AS

    Benchmarking Open-ended Audio Dialogue Understanding for Large Audio-Language Models

    Authors: Kuofeng Gao, Shu-Tao Xia, Ke Xu, Philip Torr, Jindong Gu

    Abstract: Large Audio-Language Models (LALMs) have unclocked audio dialogue capabilities, where audio dialogues are a direct exchange of spoken language between LALMs and humans. Recent advances, such as GPT-4o, have enabled LALMs in back-and-forth audio dialogues with humans. This progression not only underscores the potential of LALMs but also broadens their applicability across a wide range of practical… ▽ More

    Submitted 6 December, 2024; originally announced December 2024.

  18. arXiv:2412.00085  [pdf

    cs.CV eess.IV

    Residual Attention Single-Head Vision Transformer Network for Rolling Bearing Fault Diagnosis in Noisy Environments

    Authors: Songjiang Lai, Tsun-Hin Cheung, Jiayi Zhao, Kaiwen Xue, Ka-Chun Fung, Kin-Man Lam

    Abstract: Rolling bearings play a crucial role in industrial machinery, directly influencing equipment performance, durability, and safety. However, harsh operating conditions, such as high speeds and temperatures, often lead to bearing malfunctions, resulting in downtime, economic losses, and safety hazards. This paper proposes the Residual Attention Single-Head Vision Transformer Network (RA-SHViT-Net) fo… ▽ More

    Submitted 26 November, 2024; originally announced December 2024.

    Comments: 24 pages, 14 figures, 3 tables

  19. arXiv:2411.18003  [pdf, other

    eess.IV cs.AI cs.CV

    HAAT: Hybrid Attention Aggregation Transformer for Image Super-Resolution

    Authors: Song-Jiang Lai, Tsun-Hin Cheung, Ka-Chun Fung, Kai-wen Xue, Kin-Man Lam

    Abstract: In the research area of image super-resolution, Swin-transformer-based models are favored for their global spatial modeling and shifting window attention mechanism. However, existing methods often limit self-attention to non overlapping windows to cut costs and ignore the useful information that exists across channels. To address this issue, this paper introduces a novel model, the Hybrid Attentio… ▽ More

    Submitted 10 December, 2024; v1 submitted 26 November, 2024; originally announced November 2024.

    Comments: 6 pages, 2 figures, 1 table

  20. arXiv:2411.10775  [pdf, other

    eess.IV cs.CV cs.MM

    Beyond Feature Mapping GAP: Integrating Real HDRTV Priors for Superior SDRTV-to-HDRTV Conversion

    Authors: Kepeng Xu, Li Xu, Gang He, Zhiqiang Zhang, Wenxin Yu, Shihao Wang, Dajiang Zhou, Yunsong Li

    Abstract: The rise of HDR-WCG display devices has highlighted the need to convert SDRTV to HDRTV, as most video sources are still in SDR. Existing methods primarily focus on designing neural networks to learn a single-style mapping from SDRTV to HDRTV. However, the limited information in SDRTV and the diversity of styles in real-world conversions render this process an ill-posed problem, thereby constrainin… ▽ More

    Submitted 16 November, 2024; originally announced November 2024.

    Comments: 8 pages,4 figures

  21. arXiv:2411.10773  [pdf, other

    eess.IV cs.CV

    An End-to-End Real-World Camera Imaging Pipeline

    Authors: Kepeng Xu, Zijia Ma, Li Xu, Gang He, Yunsong Li, Wenxin Yu, Taichu Han, Cheng Yang

    Abstract: Recent advances in neural camera imaging pipelines have demonstrated notable progress. Nevertheless, the real-world imaging pipeline still faces challenges including the lack of joint optimization in system components, computational redundancies, and optical distortions such as lens shading.In light of this, we propose an end-to-end camera imaging pipeline (RealCamNet) to enhance real-world camera… ▽ More

    Submitted 16 November, 2024; originally announced November 2024.

    Comments: accept by ACMMM 2024

  22. arXiv:2411.04398  [pdf, ps, other

    eess.SP

    Radio-Based Passive Target Tracking by a Mobile Receiver with Unknown Transmitter Position

    Authors: Ke Xu, Rui Zhang, He, Chen

    Abstract: In this paper, we propose a radio-based passive target tracking algorithm using multipath measurements, including the angle of arrival and relative distance. We focus on a scenario in which a mobile receiver continuously receives radio signals from a transmitter located at an unknown position. The receiver utilizes multipath measurements extracted from the received signal to jointly localize the t… ▽ More

    Submitted 6 November, 2024; originally announced November 2024.

  23. arXiv:2410.19415  [pdf

    eess.IV cs.CV eess.SP

    Integration of Communication and Computational Imaging

    Authors: Zhenming Yu, Liming Cheng, Hongyu Huang, Wei Zhang, Liang Lin, Kun Xu

    Abstract: Communication enables the expansion of human visual perception beyond the limitations of time and distance, while computational imaging overcomes the constraints of depth and breadth. Although impressive achievements have been witnessed with the two types of technologies, the occlusive information flow between the two domains is a bottleneck hindering their ulterior progression. Herein, we propose… ▽ More

    Submitted 29 October, 2024; v1 submitted 25 October, 2024; originally announced October 2024.

  24. arXiv:2410.18582  [pdf, other

    eess.SY

    LLM-Aided Efficient Hardware Design Automation

    Authors: Kangwei Xu, Ruidi Qiu, Zhuorui Zhao, Grace Li Zhang, Ulf Schlichtmann, Bing Li

    Abstract: With the rapidly increasing complexity of modern chips, hardware engineers are required to invest more effort in tasks such as circuit design, verification, and physical implementation. These workflows often involve continuous modifications, which are labor-intensive and prone to errors. Therefore, there is an increasing need for more efficient and cost-effective Electronic Design Automation (EDA)… ▽ More

    Submitted 24 October, 2024; originally announced October 2024.

  25. arXiv:2409.14330  [pdf, other

    eess.IV cs.CV

    Thinking in Granularity: Dynamic Quantization for Image Super-Resolution by Intriguing Multi-Granularity Clues

    Authors: Mingshen Wang, Zhao Zhang, Feng Li, Ke Xu, Kang Miao, Meng Wang

    Abstract: Dynamic quantization has attracted rising attention in image super-resolution (SR) as it expands the potential of heavy SR models onto mobile devices while preserving competitive performance. Existing methods explore layer-to-bit configuration upon varying local regions, adaptively allocating the bit to each layer and patch. Despite the benefits, they still fall short in the trade-off of SR accura… ▽ More

    Submitted 22 December, 2024; v1 submitted 22 September, 2024; originally announced September 2024.

    Comments: AAAI 2025

  26. Atmospheric Turbulence-Immune Free Space Optical Communication System based on Discrete-Time Analog Transmission

    Authors: Hongyu Huang, Zhenming Yu, Yi Lei, Wei Zhang, Yongli Zhao, Shanguo Huang, Kun Xu

    Abstract: To effectively mitigate the influence of atmospheric turbulence, a novel discrete-time analog transmission free-space optical (DTAT-FSO) communication scheme is proposed. It directly maps information sources to discrete-time analog symbols via joint source-channel coding and modulation. Differently from traditional digital free space optical (TD-FSO) schemes, the proposed DTAT-FSO approach can aut… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

  27. arXiv:2409.03283  [pdf, other

    cs.SD eess.AS

    FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications

    Authors: Hao-Han Guo, Yao Hu, Kun Liu, Fei-Yu Shen, Xu Tang, Yi-Chen Wu, Feng-Long Xie, Kun Xie, Kai-Tuo Xu

    Abstract: This work proposes FireRedTTS, a foundation text-to-speech framework, to meet the growing demands for personalized and diverse generative speech applications. The framework comprises three parts: data processing, foundation system, and downstream applications. First, we comprehensively present our data processing pipeline, which transforms massive raw audio into a large-scale high-quality TTS data… ▽ More

    Submitted 11 April, 2025; v1 submitted 5 September, 2024; originally announced September 2024.

  28. arXiv:2408.10636  [pdf

    eess.IV cs.CV

    UWF-RI2FA: Generating Multi-frame Ultrawide-field Fluorescein Angiography from Ultrawide-field Retinal Imaging Improves Diabetic Retinopathy Stratification

    Authors: Ruoyu Chen, Kezheng Xu, Kangyan Zheng, Weiyi Zhang, Yan Lu, Danli Shi, Mingguang He

    Abstract: Ultrawide-field fluorescein angiography (UWF-FA) facilitates diabetic retinopathy (DR) detection by providing a clear visualization of peripheral retinal lesions. However, the intravenous dye injection with potential risks hamper its application. We aim to acquire dye-free UWF-FA images from noninvasive UWF retinal imaging (UWF-RI) using generative artificial intelligence (GenAI) and evaluate its… ▽ More

    Submitted 27 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

    Comments: 22 pages, 2 figures

  29. arXiv:2408.02025  [pdf, other

    cs.SD cs.AI eess.AS

    Contrastive Learning-based Chaining-Cluster for Multilingual Voice-Face Association

    Authors: Wuyang Chen, Yanjie Sun, Kele Xu, Yong Dou

    Abstract: The innate correlation between a person's face and voice has recently emerged as a compelling area of study, especially within the context of multilingual environments. This paper introduces our novel solution to the Face-Voice Association in Multilingual Environments (FAME) 2024 challenge, focusing on a contrastive learning-based chaining-cluster method to enhance face-voice association. This tas… ▽ More

    Submitted 19 August, 2024; v1 submitted 4 August, 2024; originally announced August 2024.

  30. Precoding Based Downlink OAM-MIMO Communications with Rate Splitting

    Authors: Ruirui Chen, Jinyang Lin, Beibei Zhang, Yu Ding, Keyue Xu

    Abstract: Orbital angular momentum (OAM) and rate splitting (RS) are the potential key techniques for the future wireless communications. As a new orthogonal resource, OAM can achieve the multifold increase of spectrum efficiency to relieve the scarcity of the spectrum resource, but how to enhance the privacy performance imposes crucial challenge for OAM communications. RS technique divides the information… ▽ More

    Submitted 2 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

    Journal ref: IEEE TRANSACTIONS ON BROADCASTING, VOL. 69, NO. 4, DECEMBER 2023

  31. arXiv:2407.17841  [pdf, ps, other

    cs.IT eess.SP

    Two-Timescale Design for Movable Antenna Array-Enabled Multiuser Uplink Communications

    Authors: Guojie Hu, Qingqing Wu, Donghui Xu, Kui Xu, Jiangbo Si, Yunlong Cai, Naofal Al-Dhahir

    Abstract: Movable antenna (MA) technology can flexibly reconfigure wireless channels by adjusting antenna positions in a local region, thus owing great potential for enhancing communication performance. This letter investigates MA technology enabled multiuser uplink communications over general Rician fading channels, which consist of a base station (BS) equipped with the MA array and multiple single-antenna… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  32. Edge AI-Enabled Chicken Health Detection Based on Enhanced FCOS-Lite and Knowledge Distillation

    Authors: Qiang Tong, Jinrui Wang, Wenshuang Yang, Songtao Wu, Wenqi Zhang, Chen Sun, Kuanhong Xu

    Abstract: The utilization of AIoT technology has become a crucial trend in modern poultry management, offering the potential to optimize farming operations and reduce human workloads. This paper presents a real-time and compact edge-AI enabled detector designed to identify chickens and their healthy statuses using frames captured by a lightweight and intelligent camera equipped with an edge-AI enabled CMOS… ▽ More

    Submitted 5 November, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

  33. arXiv:2407.07453  [pdf, other

    physics.optics eess.SP

    Waveguide Superlattices with Artificial Gauge Field Towards Colorless and Crosstalkless Ultrahigh-Density Photonic Integration

    Authors: Xuelin Zhang, Jiangbing Du, Ke Xu, Zuyuan He

    Abstract: Dense waveguides are the basic building blocks for photonic integrated circuits (PIC). Due to the rapidly increasing scale of PIC chips, high-density integration of waveguide arrays working with low crosstalk over broadband wavelength range is highly desired. However, the sub-wavelength regime of such structures has not been adequately explored in practice. Herein, we proposed a waveguide superlat… ▽ More

    Submitted 30 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  34. arXiv:2407.03889  [pdf, other

    eess.SY

    Automated C/C++ Program Repair for High-Level Synthesis via Large Language Models

    Authors: Kangwei Xu, Grace Li Zhang, Xunzhao Yin, Cheng Zhuo, Ulf Schlichtmann, Bing Li

    Abstract: In High-Level Synthesis (HLS), converting a regular C/C++ program into its HLS-compatible counterpart (HLS-C) still requires tremendous manual effort. Various program scripts have been introduced to automate this process. But the resulting codes usually contain many issues that should be manually repaired by developers. Since Large Language Models (LLMs) have the ability to automate code generatio… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  35. arXiv:2407.03753  [pdf

    eess.SP

    Enhanced Support Vector Machine Based Signal Recovery in Bandwidth-Limited 50-100 Gbit/s Flexible DS-PON

    Authors: Liyan Wu, Yanlu Huang, Kai Jin, Shangya Han, Kun Xu, Yanni Ou

    Abstract: We proposed an adaptive signal recovery algorithm with reduced complexity based on the SVM principle for flexible downstream PON. Experimental results indicate a record-high link power budget of 24 dB for bandwidth-limited 100 Gbit/s direct-detection transmission@1E-3.

    Submitted 14 February, 2025; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: We propose SVM algorithms with different solvers for signal formats like NRZ and PAM4. This simplifies complexity in flexible downstream PON while maintaining performance

  36. arXiv:2406.19856  [pdf

    eess.SP

    LUT-Assisted Clock Data Recovery and Equalization for Burst-Mode 50-100 Gbit/s Bandwidth-Limited Flexible PON

    Authors: Yanlu Huang, Liyan Wu, Shangya Han, Kai Jin, Kun Xu, Yanni Ou

    Abstract: We demonstrated LUT-assisted CDR and equalization for burst-mode 50-100 Gbit/s bandwidth-limited PON, achieving signal recovery under large 100 ppm frequency offsets and 0.5 UI phase mismatch using reduced 50ns preambles, with 0.3dB sensitivity penalty only.

    Submitted 14 February, 2025; v1 submitted 28 June, 2024; originally announced June 2024.

  37. arXiv:2406.14794  [pdf, other

    eess.IV cs.CV cs.LG

    ImageFlowNet: Forecasting Multiscale Image-Level Trajectories of Disease Progression with Irregularly-Sampled Longitudinal Medical Images

    Authors: Chen Liu, Ke Xu, Liangbo L. Shen, Guillaume Huguet, Zilong Wang, Alexander Tong, Danilo Bzdok, Jay Stewart, Jay C. Wang, Lucian V. Del Priore, Smita Krishnaswamy

    Abstract: Advances in medical imaging technologies have enabled the collection of longitudinal images, which involve repeated scanning of the same patients over time, to monitor disease progression. However, predictive modeling of such data remains challenging due to high dimensionality, irregular sampling, and data sparsity. To address these issues, we propose ImageFlowNet, a novel model designed to foreca… ▽ More

    Submitted 24 April, 2025; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: ICASSP 2025, Oral Presentation

  38. arXiv:2405.18435  [pdf, other

    eess.IV cs.CV

    QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

    Authors: Hongwei Bran Li, Fernando Navarro, Ivan Ezhov, Amirhossein Bayat, Dhritiman Das, Florian Kofler, Suprosanna Shit, Diana Waldmannstetter, Johannes C. Paetzold, Xiaobin Hu, Benedikt Wiestler, Lucas Zimmer, Tamaz Amiranashvili, Chinmay Prabhakar, Christoph Berger, Jonas Weidner, Michelle Alonso-Basant, Arif Rashid, Ujjwal Baid, Wesam Adel, Deniz Ali, Bhakti Baheti, Yingbin Bai, Ishaan Bhatt, Sabri Can Cetindag , et al. (55 additional authors not shown)

    Abstract: Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de… ▽ More

    Submitted 24 June, 2024; v1 submitted 19 March, 2024; originally announced May 2024.

    Comments: initial technical report

  39. arXiv:2405.16011  [pdf, ps, other

    eess.SP

    Semantic Importance-Aware Communications with Semantic Correction Using Large Language Models

    Authors: Shuaishuai Guo, Yanhu Wang, Jia Ye, Anbang Zhang, Kun Xu

    Abstract: Semantic communications, a promising approach for agent-human and agent-agent interactions, typically operate at a feature level, lacking true semantic understanding. This paper explores understanding-level semantic communications (ULSC), transforming visual data into human-intelligible semantic content. We employ an image caption neural network (ICNN) to derive semantic representations from visua… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  40. arXiv:2405.01961  [pdf, other

    eess.SP

    Rescale-Invariant Federated Reinforcement Learning for Resource Allocation in V2X Networks

    Authors: Kaidi Xu, Shenglong Zhou, Geoffrey Ye Li

    Abstract: Federated Reinforcement Learning (FRL) offers a promising solution to various practical challenges in resource allocation for vehicle-to-everything (V2X) networks. However, the data discrepancy among individual agents can significantly degrade the performance of FRL-based algorithms. To address this limitation, we exploit the node-wise invariance property of ReLU-activated neural networks, with th… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  41. arXiv:2404.16223  [pdf, other

    cs.CV eess.IV

    Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey

    Authors: Marcos V. Conde, Florin-Alexandru Vasluianu, Radu Timofte, Jianxing Zhang, Jia Li, Fan Wang, Xiaopeng Li, Zikun Liu, Hyunhee Park, Sejun Song, Changho Kim, Zhijuan Huang, Hongyuan Yu, Cheng Wan, Wending Xiang, Jiamin Lin, Hang Zhong, Qiaosong Zhang, Yue Sun, Xuanwu Yin, Kunlong Zuo, Senyan Xu, Siyuan Jiang, Zhijing Sun, Jiaying Zhu , et al. (10 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as nois… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 - NTIRE Workshop

  42. arXiv:2404.13640  [pdf, other

    cs.MM cs.CV eess.IV

    Beyond Alignment: Blind Video Face Restoration via Parsing-Guided Temporal-Coherent Transformer

    Authors: Kepeng Xu, Li Xu, Gang He, Wenxin Yu, Yunsong Li

    Abstract: Multiple complex degradations are coupled in low-quality video faces in the real world. Therefore, blind video face restoration is a highly challenging ill-posed problem, requiring not only hallucinating high-fidelity details but also enhancing temporal coherence across diverse pose variations. Restoring each frame independently in a naive manner inevitably introduces temporal incoherence and arti… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 9 pages

  43. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Yajing Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  44. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  45. arXiv:2403.15853  [pdf

    eess.IV cs.CV

    An edge detection-based deep learning approach for tear meniscus height measurement

    Authors: Kesheng Wang, Kunhui Xu, Xiaoyu Chen, Chunlei He, Jianfeng Zhang, Dexing Kong, Qi Dai, Shoujun Huang

    Abstract: Automatic measurements of tear meniscus height (TMH) have been achieved by using deep learning techniques; however, annotation is significantly influenced by subjective factors and is both time-consuming and labor-intensive. In this paper, we introduce an automatic TMH measurement technique based on edge detection-assisted annotation within a deep learning framework. This method generates mask lab… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: 22 pages, 5 figures

  46. arXiv:2403.10573  [pdf, other

    eess.IV cs.CR cs.CV cs.LG

    Medical Unlearnable Examples: Securing Medical Data from Unauthorized Training via Sparsity-Aware Local Masking

    Authors: Weixiang Sun, Yixin Liu, Zhiling Yan, Kaidi Xu, Lichao Sun

    Abstract: The rapid expansion of AI in healthcare has led to a surge in medical data generation and storage, boosting medical AI development. However, fears of unauthorized use, like training commercial AI models, hinder researchers from sharing their valuable datasets. To encourage data sharing, one promising solution is to introduce imperceptible noise into the data. This method aims to safeguard the data… ▽ More

    Submitted 7 July, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: Accept by ICML 2024 NextGenAISafety

  47. arXiv:2403.09923  [pdf, other

    eess.SY

    Optimal Sequencing and Motion Control in a Roundabout with Safety Guarantees

    Authors: Yingqing Chen, Christos G. Cassandras, Kaiyuan Xu

    Abstract: This paper develops a controller for Connected and Automated Vehicles (CAVs) traversing a single-lane roundabout. The controller simultaneously determines the optimal sequence and associated optimal motion control jointly minimizing travel time and energy consumption while providing speed-dependent safety guarantees, as well as satisfying velocity and acceleration constraints. This is achieved by… ▽ More

    Submitted 19 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  48. arXiv:2403.07274  [pdf, other

    cs.IT eess.SP

    Achievable Rate Analysis and Optimization of Double-RIS Assisted Spatially Correlated MIMO with Statistical CSI

    Authors: Kaizhe Xu, Jiajia Guo, Jun Zhang, Shi Jin, Shaodan Ma

    Abstract: Reconfigurable intelligent surface (RIS) is a novel meta-material which can form a smart radio environment by dynamically altering reflection directions of the impinging electromagnetic waves. In the prior literature, the inter-RIS links which also contribute to the performance of the whole system are usually neglected when multiple RISs are deployed. In this paper we investigate a general double-… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  49. arXiv:2402.13276  [pdf, other

    eess.AS cs.AI cs.SD

    When LLMs Meets Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression Detection

    Authors: Xiangyu Zhang, Hexin Liu, Kaishuai Xu, Qiquan Zhang, Daijiao Liu, Beena Ahmed, Julien Epps

    Abstract: Depression is a critical concern in global mental health, prompting extensive research into AI-based detection methods. Among various AI technologies, Large Language Models (LLMs) stand out for their versatility in mental healthcare applications. However, their primary limitation arises from their exclusive dependence on textual input, which constrains their overall capabilities. Furthermore, the… ▽ More

    Submitted 23 September, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

  50. arXiv:2401.14248  [pdf

    eess.IV cs.CV

    On generalisability of segment anything model for nuclear instance segmentation in histology images

    Authors: Kesi Xu, Lea Goetz, Nasir Rajpoot

    Abstract: Pre-trained on a large and diverse dataset, the segment anything model (SAM) is the first promptable foundation model in computer vision aiming at object segmentation tasks. In this work, we evaluate SAM for the task of nuclear instance segmentation performance with zero-shot learning and finetuning. We compare SAM with other representative methods in nuclear instance segmentation, especially in t… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.