Skip to main content

Showing 1–50 of 123 results for author: Jiang, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.01173  [pdf

    eess.SY

    An Adaptive Estimation Approach based on Fisher Information to Overcome the Challenges of LFP Battery SOC Estimation

    Authors: Junzhe Shi, Shida Jiang, Shengyu Tao, Jaewong Lee, Manashita Borah, Scott Moura

    Abstract: Robust and Real-time State of Charge (SOC) estimation is essential for Lithium Iron Phosphate (LFP) batteries, which are widely used in electric vehicles (EVs) and energy storage systems due to safety and longevity. However, the flat Open Circuit Voltage (OCV)-SOC curve makes this task particularly challenging. This challenge is complicated by hysteresis effects, and real-world conditions such as… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  2. arXiv:2506.09162  [pdf

    eess.IV cs.CV

    The RSNA Lumbar Degenerative Imaging Spine Classification (LumbarDISC) Dataset

    Authors: Tyler J. Richards, Adam E. Flanders, Errol Colak, Luciano M. Prevedello, Robyn L. Ball, Felipe Kitamura, John Mongan, Maryam Vazirabad, Hui-Ming Lin, Anne Kendell, Thanat Kanthawang, Salita Angkurawaranon, Emre Altinmakas, Hakan Dogan, Paulo Eduardo de Aguiar Kuriki, Arjuna Somasundaram, Christopher Ruston, Deniz Bulja, Naida Spahovic, Jennifer Sommer, Sirui Jiang, Eduardo Moreno Judice de Mattos Farina, Eduardo Caminha Nunes, Michael Brassil, Megan McNamara , et al. (11 additional authors not shown)

    Abstract: The Radiological Society of North America (RSNA) Lumbar Degenerative Imaging Spine Classification (LumbarDISC) dataset is the largest publicly available dataset of adult MRI lumbar spine examinations annotated for degenerative changes. The dataset includes 2,697 patients with a total of 8,593 image series from 8 institutions across 6 countries and 5 continents. The dataset is available for free fo… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  3. arXiv:2506.08967  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model

    Authors: Ailin Huang, Bingxin Li, Bruce Wang, Boyong Wu, Chao Yan, Chengli Feng, Heng Wang, Hongyu Zhou, Hongyuan Wang, Jingbei Li, Jianjian Sun, Joanna Wang, Mingrui Chen, Peng Liu, Ruihang Miao, Shilei Jiang, Tian Fei, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Ge, Zheng Gong, Zhewei Huang , et al. (51 additional authors not shown)

    Abstract: Large Audio-Language Models (LALMs) have significantly advanced intelligent human-computer interaction, yet their reliance on text-based outputs limits their ability to generate natural speech responses directly, hindering seamless audio interactions. To address this, we introduce Step-Audio-AQAA, a fully end-to-end LALM designed for Audio Query-Audio Answer (AQAA) tasks. The model integrates a du… ▽ More

    Submitted 13 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

    Comments: 12 pages, 3 figures

  4. arXiv:2506.02197  [pdf, ps, other

    eess.IV cs.CV

    NTIRE 2025 Challenge on RAW Image Restoration and Super-Resolution

    Authors: Marcos V. Conde, Radu Timofte, Zihao Lu, Xiangyu Kong, Xiaoxia Xing, Fan Wang, Suejin Han, MinKyu Park, Tianyu Zhang, Xin Luo, Yeda Chen, Dong Liu, Li Pang, Yuhang Yang, Hongzhong Wang, Xiangyong Cao, Ruixuan Jiang, Senyan Xu, Siyuan Jiang, Xueyang Fu, Zheng-Jun Zha, Tianyu Hao, Yuhong He, Ruoqi Li, Yueqi Yang , et al. (14 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2025 RAW Image Restoration and Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Restoration and Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. The goal of this challenge is two fold, (i) restore RAW images with blur and… ▽ More

    Submitted 4 June, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

    Comments: CVPR 2025 - New Trends in Image Restoration and Enhancement (NTIRE)

  5. arXiv:2505.20926  [pdf

    cs.RO eess.SY

    COM Adjustment Mechanism Control for Multi-Configuration Motion Stability of Unmanned Deformable Vehicle

    Authors: Jun Liu, Hongxun Liu, Cheng Zhang, Jiandang Xing, Shang Jiang, Ping Jiang

    Abstract: An unmanned deformable vehicle is a wheel-legged robot transforming between two configurations: vehicular and humanoid states, with different motion modes and stability characteristics. To address motion stability in multiple configurations, a center-of-mass adjustment mechanism was designed. Further, a motion stability hierarchical control algorithm was proposed, and an electromechanical model ba… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  6. arXiv:2505.20059  [pdf, other

    eess.IV

    LPCM: Learning-based Predictive Coding for LiDAR Point Cloud Compression

    Authors: Chang Sun, Hui Yuan, Shiqi Jiang, Da Ai, Wei Zhang, Raouf Hamzaoui

    Abstract: Since the data volume of LiDAR point clouds is very huge, efficient compression is necessary to reduce their storage and transmission costs. However, existing learning-based compression methods do not exploit the inherent angular resolution of LiDAR and ignore the significant differences in the correlation of geometry information at different bitrates. The predictive geometry coding method in the… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: 13 pages long, 8 figures and over 50 references. Submitted with IEEEtran journal mode. All figures are included in PDF format and the bibliography is resolved manually

  7. arXiv:2505.07916  [pdf, ps, other

    eess.AS cs.SD

    MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder

    Authors: Bowen Zhang, Congchao Guo, Geng Yang, Hang Yu, Haozhe Zhang, Heidi Lei, Jialong Mai, Junjie Yan, Kaiyue Yang, Mingqi Yang, Peikai Huang, Ruiyang Jin, Sitan Jiang, Weihua Cheng, Yawei Li, Yichen Xiao, Yiying Zhou, Yongmao Zhang, Yuan Lu, Yucen He

    Abstract: We introduce MiniMax-Speech, an autoregressive Transformer-based Text-to-Speech (TTS) model that generates high-quality speech. A key innovation is our learnable speaker encoder, which extracts timbre features from a reference audio without requiring its transcription. This enables MiniMax-Speech to produce highly expressive speech with timbre consistent with the reference in a zero-shot manner, w… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  8. arXiv:2505.00742  [pdf, other

    cs.CV cs.AI eess.IV

    Zoomer: Adaptive Image Focus Optimization for Black-box MLLM

    Authors: Jiaxu Qian, Chendong Wang, Yifan Yang, Chaoyun Zhang, Huiqiang Jiang, Xufang Luo, Yu Kang, Qingwei Lin, Anlan Zhang, Shiqi Jiang, Ting Cao, Tianjun Mao, Suman Banerjee, Guyue Liu, Saravan Rajmohan, Dongmei Zhang, Yuqing Yang, Qi Zhang, Lili Qiu

    Abstract: Recent advancements in multimodal large language models (MLLMs) have broadened the scope of vision-language tasks, excelling in applications like image captioning and interactive question-answering. However, these models struggle with accurately processing visual data, particularly in tasks requiring precise object recognition and fine visual details. Stringent token limits often result in the omi… ▽ More

    Submitted 29 April, 2025; originally announced May 2025.

  9. arXiv:2504.20383  [pdf, other

    cs.CV eess.IV

    Neural Stereo Video Compression with Hybrid Disparity Compensation

    Authors: Shiyin Jiang, Zhenghao Chen, Minghao Han, Xingyu Zhou, Leheng Zhang, Shuhang Gu

    Abstract: Disparity compensation represents the primary strategy in stereo video compression (SVC) for exploiting cross-view redundancy. These mechanisms can be broadly categorized into two types: one that employs explicit horizontal shifting, and another that utilizes an implicit cross-attention mechanism to reduce cross-view disparity redundancy. In this work, we propose a hybrid disparity compensation (H… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  10. arXiv:2502.15188  [pdf, other

    eess.IV cs.CV

    Interleaved Block-based Learned Image Compression with Feature Enhancement and Quantization Error Compensation

    Authors: Shiqi Jiang, Hui Yuan, Shuai Li, Raouf Hamzaoui, Xu Wang, Junyan Huo

    Abstract: In recent years, learned image compression (LIC) methods have achieved significant performance improvements. However, obtaining a more compact latent representation and reducing the impact of quantization errors remain key challenges in the field of LIC. To address these challenges, we propose a feature extraction module, a feature refinement module, and a feature enhancement module. Our feature e… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

  11. arXiv:2502.15174  [pdf, other

    eess.IV cs.CV

    FD-LSCIC: Frequency Decomposition-based Learned Screen Content Image Compression

    Authors: Shiqi Jiang, Hui Yuan, Shuai Li, Huanqiang Zeng, Sam Kwong

    Abstract: The learned image compression (LIC) methods have already surpassed traditional techniques in compressing natural scene (NS) images. However, directly applying these methods to screen content (SC) images, which possess distinct characteristics such as sharp edges, repetitive patterns, embedded text and graphics, yields suboptimal results. This paper addresses three key challenges in SC image compre… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

  12. arXiv:2502.11946  [pdf, other

    cs.CL cs.AI cs.HC cs.SD eess.AS

    Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

    Authors: Ailin Huang, Boyong Wu, Bruce Wang, Chao Yan, Chen Hu, Chengli Feng, Fei Tian, Feiyu Shen, Jingbei Li, Mingrui Chen, Peng Liu, Ruihang Miao, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Gong, Zixin Zhang, Hongyu Zhou, Jianjian Sun, Brian Li, Chengting Feng, Changyi Wan, Hanpeng Hu , et al. (120 additional authors not shown)

    Abstract: Real-time speech interaction, serving as a fundamental interface for human-machine collaboration, holds immense potential. However, current open-source models face limitations such as high costs in voice data collection, weakness in dynamic control, and limited intelligence. To address these challenges, this paper introduces Step-Audio, the first production-ready open-source solution. Key contribu… ▽ More

    Submitted 18 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  13. arXiv:2501.15833   

    eess.SY

    Mode Switching-Induced Instability of Multi-source Feed DC Microgrid

    Authors: Shanshan Jiang, Zelin Sun, Jiankun Zhang, Hua Geng

    Abstract: In DC microgrids (DCMGs), DC-bus signaling based control strategy is extensively used for power management, where mode switching plays a crucial role in achieving multi-source coordination. However, few studies have noticed the impact of mode switching and switching strategies on system voltage stability. To fill this gap, this paper aims to provide a general analysis framework for mode switching-… ▽ More

    Submitted 10 April, 2025; v1 submitted 27 January, 2025; originally announced January 2025.

    Comments: This submission is being withdrawn due to the need for major structural revisions to the manuscript. All authors have agreed that substantial modifications are required to improve the work's rigor and completeness. A revised version incorporating these improvements will be submitted subsequently

  14. arXiv:2501.01392  [pdf, other

    eess.IV cs.CV

    ProjectedEx: Enhancing Generation in Explainable AI for Prostate Cancer

    Authors: Xuyin Qi, Zeyu Zhang, Aaron Berliano Handoko, Huazhan Zheng, Mingxi Chen, Ta Duc Huy, Vu Minh Hieu Phan, Lei Zhang, Linqi Cheng, Shiyu Jiang, Zhiwei Zhang, Zhibin Liao, Yang Zhao, Minh-Son To

    Abstract: Prostate cancer, a growing global health concern, necessitates precise diagnostic tools, with Magnetic Resonance Imaging (MRI) offering high-resolution soft tissue imaging that significantly enhances diagnostic accuracy. Recent advancements in explainable AI and representation learning have significantly improved prostate cancer diagnosis by enabling automated and precise lesion classification. Ho… ▽ More

    Submitted 2 January, 2025; originally announced January 2025.

  15. arXiv:2412.19811  [pdf, other

    cs.NI cs.AI eess.SY

    LINKs: Large Language Model Integrated Management for 6G Empowered Digital Twin NetworKs

    Authors: Shufan Jiang, Bangyan Lin, Yue Wu, Yuan Gao

    Abstract: In the rapidly evolving landscape of digital twins (DT) and 6G networks, the integration of large language models (LLMs) presents a novel approach to network management. This paper explores the application of LLMs in managing 6G-empowered DT networks, with a focus on optimizing data retrieval and communication efficiency in smart city scenarios. The proposed framework leverages LLMs for intelligen… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

    Comments: Accepted by The 2024 IEEE 100th Vehicular Technology Conference (VTC2024-Fall)

  16. arXiv:2412.07804  [pdf, other

    eess.IV cs.AI cs.CV

    XLSTM-HVED: Cross-Modal Brain Tumor Segmentation and MRI Reconstruction Method Using Vision XLSTM and Heteromodal Variational Encoder-Decoder

    Authors: Shenghao Zhu, Yifei Chen, Shuo Jiang, Weihong Chen, Chang Liu, Yuanhan Wang, Xu Chen, Yifan Ke, Feiwei Qin, Changmiao Wang, Zhu Zhu

    Abstract: Neurogliomas are among the most aggressive forms of cancer, presenting considerable challenges in both treatment and monitoring due to their unpredictable biological behavior. Magnetic resonance imaging (MRI) is currently the preferred method for diagnosing and monitoring gliomas. However, the lack of specific imaging techniques often compromises the accuracy of tumor segmentation during the imagi… ▽ More

    Submitted 5 March, 2025; v1 submitted 9 December, 2024; originally announced December 2024.

    Comments: 5 pages, 2 figures

    Journal ref: ISBI 2025

  17. arXiv:2412.07180  [pdf, other

    eess.SP cs.IT

    Digital Twin Assisted Beamforming Design for Integrated Sensing and Communication Systems

    Authors: Shuaifeng Jiang, Ahmed Alkhateeb

    Abstract: This paper explores a novel research direction where a digital twin is leveraged to assist the beamforming design for an integrated sensing and communication (ISAC) system. In this setup, a base station designs joint communication and sensing beamforming to serve the communication user and detect the sensing target concurrently. Utilizing the electromagnetic (EM) 3D model of the environment and ra… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

    Comments: Presented at Asilomar 2024. Code and dataset will be available on the paper page: https://www.wi-lab.net/research/digital-twin-assisted-beamforming-design-for-integrated-sensing-and-communication-systems/

  18. arXiv:2412.02612  [pdf, other

    cs.CL cs.SD eess.AS

    GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot

    Authors: Aohan Zeng, Zhengxiao Du, Mingdao Liu, Kedong Wang, Shengmin Jiang, Lei Zhao, Yuxiao Dong, Jie Tang

    Abstract: We introduce GLM-4-Voice, an intelligent and human-like end-to-end spoken chatbot. It supports both Chinese and English, engages in real-time voice conversations, and varies vocal nuances such as emotion, intonation, speech rate, and dialect according to user instructions. GLM-4-Voice uses an ultra-low bitrate (175bps), single-codebook speech tokenizer with 12.5Hz frame rate derived from an automa… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.

  19. arXiv:2411.17607  [pdf, other

    cs.CL cs.SD eess.AS

    Scaling Speech-Text Pre-training with Synthetic Interleaved Data

    Authors: Aohan Zeng, Zhengxiao Du, Mingdao Liu, Lei Zhang, Shengmin Jiang, Yuxiao Dong, Jie Tang

    Abstract: Speech language models (SpeechLMs) accept speech input and produce speech output, allowing for more natural human-computer interaction compared to text-based large language models (LLMs). Traditional approaches for developing SpeechLMs are constrained by the limited availability of unsupervised speech data and parallel speech-text data, which are significantly less abundant than text pre-training… ▽ More

    Submitted 2 December, 2024; v1 submitted 26 November, 2024; originally announced November 2024.

  20. arXiv:2411.12776  [pdf, other

    eess.IV cs.CR cs.MM

    Cross-Layer Encrypted Semantic Communication Framework for Panoramic Video Transmission

    Authors: Haixiao Gao, Mengying Sun, Xiaodong Xu, Bingxuan Xu, Shujun Han, Bizhu Wang, Sheng Jiang, Chen Dong, Ping Zhang

    Abstract: In this paper, we propose a cross-layer encrypted semantic communication (CLESC) framework for panoramic video transmission, incorporating feature extraction, encoding, encryption, cyclic redundancy check (CRC), and retransmission processes to achieve compatibility between semantic communication and traditional communication systems. Additionally, we propose an adaptive cross-layer transmission me… ▽ More

    Submitted 19 November, 2024; originally announced November 2024.

  21. arXiv:2410.23325  [pdf

    eess.AS cs.AI cs.MM cs.SD

    Transfer Learning in Vocal Education: Technical Evaluation of Limited Samples Describing Mezzo-soprano

    Authors: Zhenyi Hou, Xu Zhao, Kejie Ye, Xinyu Sheng, Shanggerile Jiang, Jiajing Xia, Yitao Zhang, Chenxi Ban, Daijun Luo, Jiaxing Chen, Yan Zou, Yuchao Feng, Guangyu Fan, Xin Yuan

    Abstract: Vocal education in the music field is difficult to quantify due to the individual differences in singers' voices and the different quantitative criteria of singing techniques. Deep learning has great potential to be applied in music education due to its efficiency to handle complex data and perform quantitative analysis. However, accurate evaluations with limited samples over rare vocal types, suc… ▽ More

    Submitted 30 October, 2024; originally announced October 2024.

  22. arXiv:2410.13992  [pdf

    eess.SY

    Resilience-Oriented DG Siting and Sizing Considering Energy Equity Constraint

    Authors: Chenchen Li, Fangxing Li, Sufan Jiang, Jin Zhao, Shiyuan Fan, Leon M. Tolbert

    Abstract: Extreme weather events can cause widespread power outages and huge economic losses. Low-income customers are more vulnerable to power outages because they live in areas with poorly equipped distribution systems. However, existing approaches to improve grid resilience focus on the overall condition of the system and ignore the outage experiences of low-income customers, which leads to significant e… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  23. arXiv:2410.04847  [pdf, other

    eess.IV cs.CV

    Causal Context Adjustment Loss for Learned Image Compression

    Authors: Minghao Han, Shiyin Jiang, Shengxi Li, Xin Deng, Mai Xu, Ce Zhu, Shuhang Gu

    Abstract: In recent years, learned image compression (LIC) technologies have surpassed conventional methods notably in terms of rate-distortion (RD) performance. Most present learned techniques are VAE-based with an autoregressive entropy model, which obviously promotes the RD performance by utilizing the decoded causal context. However, extant methods are highly dependent on the fixed hand-crafted causal c… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

    Comments: Accepted to NeurIPS 2024

  24. arXiv:2409.10293  [pdf, other

    eess.IV cs.CV

    SPAC: Sampling-based Progressive Attribute Compression for Dense Point Clouds

    Authors: Xiaolong Mao, Hui Yuan, Tian Guo, Shiqi Jiang, Raouf Hamzaoui, Sam Kwong

    Abstract: We propose an end-to-end attribute compression method for dense point clouds. The proposed method combines a frequency sampling module, an adaptive scale feature extraction module with geometry assistance, and a global hyperprior entropy model. The frequency sampling module uses a Hamming window and the Fast Fourier Transform to extract high-frequency components of the point cloud. The difference… ▽ More

    Submitted 16 September, 2024; originally announced September 2024.

    Comments: 136pages, 13 figures

  25. arXiv:2409.02564  [pdf, other

    cs.IT eess.SP

    Learnable Wireless Digital Twins: Reconstructing Electromagnetic Field with Neural Representations

    Authors: Shuaifeng Jiang, Qi Qu, Xiaqing Pan, Abhishek Agrawal, Richard Newcombe, Ahmed Alkhateeb

    Abstract: Fully harvesting the gain of multiple-input and multiple-output (MIMO) requires accurate channel information. However, conventional channel acquisition methods mainly rely on pilot training signals, resulting in significant training overheads (time, energy, spectrum). Digital twin-aided communications have been proposed in [1] to reduce or eliminate this overhead by approximating the real world wi… ▽ More

    Submitted 25 September, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

  26. arXiv:2408.15252  [pdf, other

    eess.SP cs.AI

    Generative AI on SpectrumNet: An Open Benchmark of Multiband 3D Radio Maps

    Authors: Shuhang Zhang, Shuai Jiang, Wanjie Lin, Zheng Fang, Kangjun Liu, Hongliang Zhang, Ke Chen

    Abstract: Radio map is an efficient demonstration for visually displaying the wireless signal coverage within a certain region. It has been considered to be increasingly helpful for the future sixth generation (6G) of wireless networks, as wireless nodes are becoming more crowded and complicated. However, the construction of high resolution radio map is very challenging due to the sparse sampling in practic… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: 30 pages, 15 figures

  27. arXiv:2408.01127  [pdf, ps, other

    eess.SY

    Relax, Estimate, and Track: a Simple Battery State-of-charge and State-of-health Estimation Method

    Authors: Shida Jiang, Junzhe Shi, Scott Moura

    Abstract: Battery management is a critical component of ubiquitous battery-powered energy systems, in which battery state-of-charge (SOC) and state-of-health (SOH) estimations are of crucial importance. Conventional SOC and SOH estimation methods, especially model-based methods, often lack accurate modeling of the open circuit voltage (OCV), have relatively high computational complexity, and lack theoretica… ▽ More

    Submitted 6 June, 2025; v1 submitted 2 August, 2024; originally announced August 2024.

    Comments: Minor changes to texts. The codes and dataset are now attached

  28. arXiv:2407.17777  [pdf, other

    cs.AI cs.CV cs.LG eess.SP

    Babel: A Scalable Pre-trained Model for Multi-Modal Sensing via Expandable Modality Alignment

    Authors: Shenghong Dai, Shiqi Jiang, Yifan Yang, Ting Cao, Mo Li, Suman Banerjee, Lili Qiu

    Abstract: This paper presents Babel, the expandable modality alignment model, specially designed for multi-modal sensing. While there has been considerable work on multi-modality alignment, they all struggle to effectively incorporate multiple sensing modalities due to the data scarcity constraints. How to utilize multi-modal data with partial pairings in sensing remains an unresolved challenge. Babel tackl… ▽ More

    Submitted 21 March, 2025; v1 submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted by SenSys'25

  29. OMR-NET: a two-stage octave multi-scale residual network for screen content image compression

    Authors: Shiqi Jiang, Ting Ren, Congrui Fu, Shuai Li, Hui Yuan

    Abstract: Screen content (SC) differs from natural scene (NS) with unique characteristics such as noise-free, repetitive patterns, and high contrast. Aiming at addressing the inadequacies of current learned image compression (LIC) methods for SC, we propose an improved two-stage octave convolutional residual blocks (IToRB) for high and low-frequency feature extraction and a cascaded two-stage multi-scale re… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 7 figures, 2 tables

    Journal ref: IEEE Signal Processing Letters, 2024

  30. arXiv:2407.08466  [pdf, other

    eess.IV cs.CV

    Global Spatial-Temporal Information-based Residual ConvLSTM for Video Space-Time Super-Resolution

    Authors: Congrui Fu, Hui Yuan, Shiqi Jiang, Guanghui Zhang, Liquan Shen, Raouf Hamzaoui

    Abstract: By converting low-frame-rate, low-resolution videos into high-frame-rate, high-resolution ones, space-time video super-resolution techniques can enhance visual experiences and facilitate more efficient information dissemination. We propose a convolutional neural network (CNN) for space-time video super-resolution, namely GIRNet. To generate highly accurate features and thus improve performance, th… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  31. arXiv:2407.05717  [pdf, ps, other

    eess.SY cs.RO eess.SP

    A New Framework for Nonlinear Kalman Filters

    Authors: Shida Jiang, Junzhe Shi, Scott Moura

    Abstract: The Kalman filter (KF) is a state estimation algorithm that optimally combines system knowledge and measurements to minimize the mean squared error of the estimated states. While KF was initially designed for linear systems, numerous extensions of it, such as extended Kalman filter (EKF), unscented Kalman filter (UKF), cubature Kalman filter (CKF), etc., have been proposed for nonlinear systems ov… ▽ More

    Submitted 19 June, 2025; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: Massive revisions to the theoretical analysis part. Now Theorem 1 becomes much stronger and useful

  32. arXiv:2406.17225  [pdf, other

    eess.IV cs.CV

    Multimodal Cross-Task Interaction for Survival Analysis in Whole Slide Pathological Images

    Authors: Songhan Jiang, Zhengyu Gan, Linghan Cai, Yifeng Wang, Yongbing Zhang

    Abstract: Survival prediction, utilizing pathological images and genomic profiles, is increasingly important in cancer analysis and prognosis. Despite significant progress, precise survival analysis still faces two main challenges: (1) The massive pixels contained in whole slide images (WSIs) complicate the process of pathological images, making it difficult to generate an effective representation of the tu… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  33. arXiv:2406.07823  [pdf, other

    cs.CL cs.SD eess.AS

    PRoDeliberation: Parallel Robust Deliberation for End-to-End Spoken Language Understanding

    Authors: Trang Le, Daniel Lazar, Suyoun Kim, Shan Jiang, Duc Le, Adithya Sagar, Aleksandr Livshits, Ahmed Aly, Akshat Shrivastava

    Abstract: Spoken Language Understanding (SLU) is a critical component of voice assistants; it consists of converting speech to semantic parses for task execution. Previous works have explored end-to-end models to improve the quality and robustness of SLU models with Deliberation, however these models have remained autoregressive, resulting in higher latencies. In this work we introduce PRoDeliberation, a no… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  34. arXiv:2404.16223  [pdf, other

    cs.CV eess.IV

    Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey

    Authors: Marcos V. Conde, Florin-Alexandru Vasluianu, Radu Timofte, Jianxing Zhang, Jia Li, Fan Wang, Xiaopeng Li, Zikun Liu, Hyunhee Park, Sejun Song, Changho Kim, Zhijuan Huang, Hongyuan Yu, Cheng Wan, Wending Xiang, Jiamin Lin, Hang Zhong, Qiaosong Zhang, Yue Sun, Xuanwu Yin, Kunlong Zuo, Senyan Xu, Siyuan Jiang, Zhijing Sun, Jiaying Zhu , et al. (10 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as nois… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 - NTIRE Workshop

  35. arXiv:2403.18339  [pdf, other

    eess.IV cs.CV

    H2ASeg: Hierarchical Adaptive Interaction and Weighting Network for Tumor Segmentation in PET/CT Images

    Authors: Jinpeng Lu, Jingyun Chen, Linghan Cai, Songhan Jiang, Yongbing Zhang

    Abstract: Positron emission tomography (PET) combined with computed tomography (CT) imaging is routinely used in cancer diagnosis and prognosis by providing complementary information. Automatically segmenting tumors in PET/CT images can significantly improve examination efficiency. Traditional multi-modal segmentation solutions mainly rely on concatenation operations for modality fusion, which fail to effec… ▽ More

    Submitted 28 March, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: 10 pages,4 figures

  36. arXiv:2403.06798  [pdf, other

    eess.IV cs.CV cs.LG

    Dynamic Perturbation-Adaptive Adversarial Training on Medical Image Classification

    Authors: Shuai Li, Xiaoguang Ma, Shancheng Jiang, Lu Meng

    Abstract: Remarkable successes were made in Medical Image Classification (MIC) recently, mainly due to wide applications of convolutional neural networks (CNNs). However, adversarial examples (AEs) exhibited imperceptible similarity with raw data, raising serious concerns on network robustness. Although adversarial training (AT), in responding to malevolent AEs, was recognized as an effective approach to im… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 9 pages, 4 figures, 2 tables

  37. arXiv:2402.19434  [pdf, other

    cs.IT eess.SP

    Digital Twin Aided Massive MIMO: CSI Compression and Feedback

    Authors: Shuaifeng Jiang, Ahmed Alkhateeb

    Abstract: Deep learning (DL) approaches have demonstrated high performance in compressing and reconstructing the channel state information (CSI) and reducing the CSI feedback overhead in massive MIMO systems. One key challenge, however, with the DL approaches is the demand for extensive training data. Collecting this real-world CSI data incurs significant overhead that hinders the DL approaches from scaling… ▽ More

    Submitted 29 February, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: Accepted in ICC 2024. Dataset and code files will be available soon on the DeepMIMO website https://www.deepmimo.net/

  38. arXiv:2312.17266  [pdf

    eess.IV cs.AI cs.CV cs.RO

    Automatic laminectomy cutting plane planning based on artificial intelligence in robot assisted laminectomy surgery

    Authors: Zhuofu Li, Yonghong Zhang, Chengxia Wang, Shanshan Liu, Xiongkang Song, Xuquan Ji, Shuai Jiang, Woquan Zhong, Lei Hu, Weishi Li

    Abstract: Objective: This study aims to use artificial intelligence to realize the automatic planning of laminectomy, and verify the method. Methods: We propose a two-stage approach for automatic laminectomy cutting plane planning. The first stage was the identification of key points. 7 key points were manually marked on each CT image. The Spatial Pyramid Upsampling Network (SPU-Net) algorithm developed by… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

  39. arXiv:2312.05829  [pdf, other

    cs.IT eess.SP

    EM Based p-norm-like Constraint RLS Algorithm for Sparse System Identification

    Authors: Shuyang Jiang, Kung Yao

    Abstract: In this paper, the recursive least squares (RLS) algorithm is considered in the sparse system identification setting. The cost function of RLS algorithm is regularized by a $p$-norm-like ($0 \leq p \leq 1$) constraint of the estimated system parameters. In order to minimize the regularized cost function, we transform it into a penalized maximum likelihood (ML) problem, which is solved by the expec… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: 11 pages, 3 figures, journal manuscript

  40. arXiv:2312.03225  [pdf, other

    cs.RO eess.SY

    Snake Robot with Tactile Perception Navigates on Large-scale Challenging Terrain

    Authors: Shuo Jiang, Adarsh Salagame, Alireza Ramezani, Lawson Wong

    Abstract: Along with the advancement of robot skin technology, there has been notable progress in the development of snake robots featuring body-surface tactile perception. In this study, we proposed a locomotion control framework for snake robots that integrates tactile perception to augment their adaptability to various terrains. Our approach embraces a hierarchical reinforcement learning (HRL) architectu… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  41. arXiv:2312.03223  [pdf, other

    cs.RO eess.SY

    Hierarchical RL-Guided Large-scale Navigation of a Snake Robot

    Authors: Shuo Jiang, Adarsh Salagame, Alireza Ramezani, Lawson Wong

    Abstract: Classical snake robot control leverages mimicking snake-like gaits tuned for specific environments. However, to operate adaptively in unstructured environments, gait generation must be dynamically scheduled. In this work, we present a four-layer hierarchical control scheme to enable the snake robot to navigate freely in large-scale environments. The proposed model decomposes navigation into global… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: arXiv admin note: text overlap with arXiv:2311.14878

  42. arXiv:2311.08075  [pdf, ps, other

    eess.IV cs.CV cs.HC

    GlanceSeg: Real-time microaneurysm lesion segmentation with gaze-map-guided foundation model for early detection of diabetic retinopathy

    Authors: Hongyang Jiang, Mengdi Gao, Zirong Liu, Chen Tang, Xiaoqing Zhang, Shuai Jiang, Wu Yuan, Jiang Liu

    Abstract: Early-stage diabetic retinopathy (DR) presents challenges in clinical diagnosis due to inconspicuous and minute microangioma lesions, resulting in limited research in this area. Additionally, the potential of emerging foundation models, such as the segment anything model (SAM), in medical scenarios remains rarely explored. In this work, we propose a human-in-the-loop, label-free early DR diagnosis… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: 12 pages, 10 figures

  43. arXiv:2311.05432  [pdf, other

    cs.CV eess.IV

    Dual Pipeline Style Transfer with Input Distribution Differentiation

    Authors: ShiQi Jiang, JunJie Kang, YuJian Li

    Abstract: The color and texture dual pipeline architecture (CTDP) suppresses texture representation and artifacts through masked total variation loss (Mtv), and further experiments have shown that smooth input can almost completely eliminate texture representation. We have demonstrated through experiments that smooth input is not the key reason for removing texture representations, but rather the distributi… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  44. arXiv:2311.00483  [pdf, other

    eess.IV cs.CV

    DEFN: Dual-Encoder Fourier Group Harmonics Network for Three-Dimensional Indistinct-Boundary Object Segmentation

    Authors: Xiaohua Jiang, Yihao Guo, Jian Huang, Yuting Wu, Meiyi Luo, Zhaoyang Xu, Qianni Zhang, Xingru Huang, Hong He, Shaowei Jiang, Jing Ye, Mang Xiao

    Abstract: The precise spatial and quantitative delineation of indistinct-boundary medical objects is paramount for the accuracy of diagnostic protocols, efficacy of surgical interventions, and reliability of postoperative assessments. Despite their significance, the effective segmentation and instantaneous three-dimensional reconstruction are significantly impeded by the paucity of representative samples in… ▽ More

    Submitted 19 June, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

    Comments: 36pages,16figures,7tables

    MSC Class: 68; 92 ACM Class: I.4; J.3

  45. arXiv:2310.20427  [pdf, other

    eess.IV cs.CV cs.LG

    Assessing and Enhancing Robustness of Deep Learning Models with Corruption Emulation in Digital Pathology

    Authors: Peixiang Huang, Songtao Zhang, Yulu Gan, Rui Xu, Rongqi Zhu, Wenkang Qin, Limei Guo, Shan Jiang, Lin Luo

    Abstract: Deep learning in digital pathology brings intelligence and automation as substantial enhancements to pathological analysis, the gold standard of clinical diagnosis. However, multiple steps from tissue preparation to slide imaging introduce various image corruptions, making it difficult for deep neural network (DNN) models to achieve stable diagnostic results for clinical use. In order to assess an… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

  46. arXiv:2310.06339  [pdf, other

    eess.IV cs.LG

    Automatic nodule identification and differentiation in ultrasound videos to facilitate per-nodule examination

    Authors: Siyuan Jiang, Yan Ding, Yuling Wang, Lei Xu, Wenli Dai, Wanru Chang, Jianfeng Zhang, Jie Yu, Jianqiao Zhou, Chunquan Zhang, Ping Liang, Dexing Kong

    Abstract: Ultrasound is a vital diagnostic technique in health screening, with the advantages of non-invasive, cost-effective, and radiation free, and therefore is widely applied in the diagnosis of nodules. However, it relies heavily on the expertise and clinical experience of the sonographer. In ultrasound images, a single nodule might present heterogeneous appearances in different cross-sectional views w… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  47. VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence

    Authors: Jianing Qiu, Jian Wu, Hao Wei, Peilun Shi, Minqing Zhang, Yunyun Sun, Lin Li, Hanruo Liu, Hongyi Liu, Simeng Hou, Yuyang Zhao, Xuehui Shi, Junfang Xian, Xiaoxia Qu, Sirui Zhu, Lijie Pan, Xiaoniao Chen, Xiaojia Zhang, Shuai Jiang, Kebing Wang, Chenlong Yang, Mingqiang Chen, Sujie Fan, Jianhua Hu, Aiguo Lv , et al. (17 additional authors not shown)

    Abstract: We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassifi… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

    Journal ref: The latest VisionFM work has been published in NEJM AI, 2024

  48. arXiv:2306.16846  [pdf, other

    cs.CV eess.IV

    Lightweight texture transfer based on texture feature preset

    Authors: ShiQi Jiang

    Abstract: In the task of texture transfer, reference texture images typically exhibit highly repetitive texture features, and the texture transfer results from different content images under the same style also share remarkably similar texture patterns. Encoding such highly similar texture features often requires deep layers and a large number of channels, making it is also the main source of the entire mod… ▽ More

    Submitted 1 January, 2024; v1 submitted 29 June, 2023; originally announced June 2023.

  49. arXiv:2306.10515  [pdf, other

    eess.SP cs.CV

    Vision Guided MIMO Radar Beamforming for Enhanced Vital Signs Detection in Crowds

    Authors: Shuaifeng Jiang, Ahmed Alkhateeb, Daniel W. Bliss, Yu Rong

    Abstract: Radar as a remote sensing technology has been used to analyze human activity for decades. Despite all the great features such as motion sensitivity, privacy preservation, penetrability, and more, radar has limited spatial degrees of freedom compared to optical sensors and thus makes it challenging to sense crowded environments without prior information. In this paper, we develop a novel dual-sensi… ▽ More

    Submitted 18 June, 2023; originally announced June 2023.

  50. Zero-shot Medical Image Translation via Frequency-Guided Diffusion Models

    Authors: Yunxiang Li, Hua-Chieh Shao, Xiao Liang, Liyuan Chen, Ruiqi Li, Steve Jiang, Jing Wang, You Zhang

    Abstract: Recently, the diffusion model has emerged as a superior generative model that can produce high quality and realistic images. However, for medical image translation, the existing diffusion models are deficient in accurately retaining structural information since the structure details of source domain images are lost during the forward diffusion process and cannot be fully recovered through learned… ▽ More

    Submitted 27 October, 2023; v1 submitted 5 April, 2023; originally announced April 2023.

    Journal ref: IEEE Transactions on Medical Imaging, 2023