Skip to main content

Showing 1–50 of 100 results for author: Jin, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.04435  [pdf, ps, other

    eess.SP

    Context-Aware Deep Learning for Robust Channel Extrapolation in Fluid Antenna Systems

    Authors: Yanliang Jin, Runze Yu, Yuan Gao, Shengli Liu, Xiaoli Chu, Kai-Kit Wong, Chan-Byoung Chae

    Abstract: Fluid antenna systems (FAS) offer remarkable spatial flexibility but face significant challenges in acquiring high-resolution channel state information (CSI), leading to considerable overhead. To address this issue, we propose CANet, a robust deep learning model for channel extrapolation in FAS. CANet combines context-adaptive modeling with a cross-scale attention mechanism and is built on a ConvN… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

  2. arXiv:2506.23759  [pdf, ps, other

    eess.IV cs.CV

    Spatio-Temporal Representation Decoupling and Enhancement for Federated Instrument Segmentation in Surgical Videos

    Authors: Zheng Fang, Xiaoming Qi, Chun-Mei Feng, Jialun Pei, Weixin Si, Yueming Jin

    Abstract: Surgical instrument segmentation under Federated Learning (FL) is a promising direction, which enables multiple surgical sites to collaboratively train the model without centralizing datasets. However, there exist very limited FL works in surgical data science, and FL methods for other modalities do not consider inherent characteristics in surgical domain: i) different scenarios show diverse anato… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  3. arXiv:2505.21928  [pdf

    eess.IV cs.AI cs.CV cs.LG

    Subspecialty-Specific Foundation Model for Intelligent Gastrointestinal Pathology

    Authors: Lianghui Zhu, Xitong Ling, Minxi Ouyang, Xiaoping Liu, Tian Guan, Mingxi Fu, Zhiqiang Cheng, Fanglei Fu, Maomao Zeng, Liming Liu, Song Duan, Qiang Huang, Ying Xiao, Jianming Li, Shanming Lu, Zhenghua Piao, Mingxi Zhu, Yibo Jin, Shan Xu, Qiming He, Yizhi Wang, Junru Cheng, Xuanyu Wang, Luxi Xie, Houqiang Li , et al. (2 additional authors not shown)

    Abstract: Gastrointestinal (GI) diseases represent a clinically significant burden, necessitating precise diagnostic approaches to optimize patient outcomes. Conventional histopathological diagnosis suffers from limited reproducibility and diagnostic variability. To overcome these limitations, we develop Digepath, a specialized foundation model for GI pathology. Our framework introduces a dual-phase iterati… ▽ More

    Submitted 6 June, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

  4. arXiv:2505.16305  [pdf, ps, other

    cs.LG eess.SP

    Large-Scale Bayesian Tensor Reconstruction: An Approximate Message Passing Solution

    Authors: Bingyang Cheng, Zhongtao Chen, Yichen Jin, Hao Zhang, Chen Zhang, Edmud Y. Lam, Yik-Chung Wu

    Abstract: Tensor CANDECOMP/PARAFAC decomposition (CPD) is a fundamental model for tensor reconstruction. Although the Bayesian framework allows for principled uncertainty quantification and automatic hyperparameter learning, existing methods do not scale well for large tensors because of high-dimensional matrix inversions. To this end, we introduce CP-GAMP, a scalable Bayesian CPD algorithm. This algorithm… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  5. arXiv:2505.16211  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models

    Authors: Kai Li, Can Shen, Yile Liu, Jirui Han, Kelong Zheng, Xuechao Zou, Zhe Wang, Xingjian Du, Shun Zhang, Hanjun Luo, Yingbin Jin, Xinxin Xing, Ziyang Ma, Yue Liu, Xiaojun Jia, Yifan Zhang, Junfeng Fang, Kun Wang, Yibo Yan, Haoyang Li, Yiming Li, Xiaobin Zhuang, Yang Liu, Haibo Hu, Zhizheng Wu , et al. (6 additional authors not shown)

    Abstract: The rapid advancement and expanding applications of Audio Large Language Models (ALLMs) demand a rigorous understanding of their trustworthiness. However, systematic research on evaluating these models, particularly concerning risks unique to the audio modality, remains largely unexplored. Existing evaluation frameworks primarily focus on the text modality or address only a restricted set of safet… ▽ More

    Submitted 1 July, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: Technical Report

  6. arXiv:2505.08581  [pdf, other

    cs.CV eess.IV q-bio.TO

    ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking

    Authors: Haofeng Liu, Mingqi Gao, Xuxiao Luo, Ziyue Wang, Guanyi Qin, Junde Wu, Yueming Jin

    Abstract: Surgical scene segmentation is critical in computer-assisted surgery and is vital for enhancing surgical quality and patient outcomes. Recently, referring surgical segmentation is emerging, given its advantage of providing surgeons with an interactive experience to segment the target object. However, existing methods are limited by low efficiency and short-term tracking, hindering their applicabil… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: Early accepted by MICCAI 2025

  7. arXiv:2505.04203  [pdf, ps, other

    cs.GR cs.SD eess.AS

    ELGAR: Expressive Cello Performance Motion Generation for Audio Rendition

    Authors: Zhiping Qiu, Yitong Jin, Yuan Wang, Yi Shi, Chongwu Wang, Chao Tan, Xiaobing Li, Feng Yu, Tao Yu, Qionghai Dai

    Abstract: The art of instrument performance stands as a vivid manifestation of human creativity and emotion. Nonetheless, generating instrument performance motions is a highly challenging task, as it requires not only capturing intricate movements but also reconstructing the complex dynamics of the performer-instrument interaction. While existing works primarily focus on modeling partial body motions, we pr… ▽ More

    Submitted 1 July, 2025; v1 submitted 7 May, 2025; originally announced May 2025.

    Journal ref: SIGGRAPH 2025

  8. arXiv:2505.00059  [pdf, other

    cs.CL cs.SD eess.AS

    BERSting at the Screams: A Benchmark for Distanced, Emotional and Shouted Speech Recognition

    Authors: Paige Tuttösí, Mantaj Dhillon, Luna Sang, Shane Eastwood, Poorvi Bhatia, Quang Minh Dinh, Avni Kapoor, Yewon Jin, Angelica Lim

    Abstract: Some speech recognition tasks, such as automatic speech recognition (ASR), are approaching or have reached human performance in many reported metrics. Yet, they continue to struggle in complex, real-world, situations, such as with distanced speech. Previous challenges have released datasets to address the issue of distanced ASR, however, the focus remains primarily on distance, specifically relyin… ▽ More

    Submitted 30 April, 2025; originally announced May 2025.

    Comments: Accepted to Computer Speech and Language, Special issue: Multi-Speaker, Multi-Microphone, and Multi-Modal Distant Speech Recognition (September 2025)

  9. arXiv:2504.15756  [pdf, other

    cs.CV eess.IV

    DSDNet: Raw Domain Demoiréing via Dual Color-Space Synergy

    Authors: Qirui Yang, Fangpu Zhang, Yeying Jin, Qihua Cheng, Pengtao Jiang, Huanjing Yue, Jingyu Yang

    Abstract: With the rapid advancement of mobile imaging, capturing screens using smartphones has become a prevalent practice in distance learning and conference recording. However, moiré artifacts, caused by frequency aliasing between display screens and camera sensors, are further amplified by the image signal processing pipeline, leading to severe visual degradation. Existing sRGB domain demoiréing methods… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  10. arXiv:2504.12711  [pdf, other

    cs.CV cs.AI eess.IV

    NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results

    Authors: Xin Li, Yeying Jin, Xin Jin, Zongwei Wu, Bingchen Li, Yufei Wang, Wenhan Yang, Yu Li, Zhibo Chen, Bihan Wen, Robby T. Tan, Radu Timofte, Qiyu Rong, Hongyuan Jing, Mengmeng Zhang, Jinglong Li, Xiangyu Lu, Yi Ren, Yuting Liu, Meng Zhang, Xiang Chen, Qiyuan Guan, Jiangxin Dong, Jinshan Pan, Conglin Gou , et al. (112 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images. This challenge received a wide range of impressive solutions, which are developed and evaluated using our collected real-world Raindrop Clarity dataset. Unlike existing deraining datasets, our Raindrop Clarity dataset is more diverse and challenging in degradation types and contents, which includ… ▽ More

    Submitted 19 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

    Comments: Challenge Report of CVPR NTIRE 2025; 26 pages; Methods from 32 teams

  11. arXiv:2503.10522  [pdf, other

    cs.MM cs.CV cs.LG cs.SD eess.AS

    AudioX: Diffusion Transformer for Anything-to-Audio Generation

    Authors: Zeyue Tian, Yizhu Jin, Zhaoyang Liu, Ruibin Yuan, Xu Tan, Qifeng Chen, Wei Xue, Yike Guo

    Abstract: Audio and music generation have emerged as crucial tasks in many applications, yet existing approaches face significant limitations: they operate in isolation without unified capabilities across modalities, suffer from scarce high-quality, multi-modal training data, and struggle to effectively integrate diverse inputs. In this work, we propose AudioX, a unified Diffusion Transformer model for Anyt… ▽ More

    Submitted 23 April, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

    Comments: The code and datasets will be available at https://zeyuet.github.io/AudioX/

  12. arXiv:2503.03199  [pdf, other

    eess.IV q-bio.QM

    PathRWKV: Enabling Whole Slide Prediction with Recurrent-Transformer

    Authors: Sicheng Chen, Tianyi Zhang, Dankai Liao, Dandan Li, Low Chang Han, Yanqin Jiang, Yueming Jin, Shangqing Lyu

    Abstract: Pathological diagnosis plays a critical role in clinical practice, where the whole slide images (WSIs) are widely applied. Through a two-stage paradigm, recent deep learning approaches enhance the WSI analysis with tile-level feature extracting and slide-level feature modeling. Current Transformer models achieved improvement in the efficiency and accuracy to previous multiple instance learning bas… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

    Comments: 11 pages, 2 figures

  13. arXiv:2503.02344  [pdf, other

    eess.SP

    A General Optimization Framework for Tackling Distance Constraints in Movable Antenna-Aided Systems

    Authors: Yichen Jin, Qingfeng Lin, Yang Li, Hancheng Zhu, Bingyang Cheng, Yik-Chung Wu, Rui Zhang

    Abstract: The recently emerged movable antenna (MA) shows great promise in leveraging spatial degrees of freedom to enhance the performance of wireless systems. However, resource allocation in MA-aided systems faces challenges due to the nonconvex and coupled constraints on antenna positions. This paper systematically reveals the challenges posed by the minimum antenna separation distance constraints. Furth… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

    Comments: 8 figures

  14. arXiv:2502.05228  [pdf

    quant-ph cs.AI eess.SY

    Multi-Objective Mobile Damped Wave Algorithm (MOMDWA): A Novel Approach For Quantum System Control

    Authors: Juntao Yu, Jiaquan Yu, Dedai Wei, Xinye Sha, Shengwei Fu, Miuyu Qiu, Yurun Jin, Kaichen Ouyang

    Abstract: In this paper, we introduce a novel multi-objective optimization algorithm, the Multi-Objective Mobile Damped Wave Algorithm (MOMDWA), specifically designed to address complex quantum control problems. Our approach extends the capabilities of the original Mobile Damped Wave Algorithm (MDWA) by incorporating multiple objectives, enabling a more comprehensive optimization process. We applied MOMDWA… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

  15. arXiv:2502.04128  [pdf, other

    eess.AS cs.AI cs.CL cs.MM cs.SD

    Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

    Authors: Zhen Ye, Xinfa Zhu, Chi-Min Chan, Xinsheng Wang, Xu Tan, Jiahe Lei, Yi Peng, Haohe Liu, Yizhu Jin, Zheqi Dai, Hongzhan Lin, Jianyi Chen, Xingjian Du, Liumeng Xue, Yunlin Chen, Zhifei Li, Lei Xie, Qiuqiang Kong, Yike Guo, Wei Xue

    Abstract: Recent advances in text-based large language models (LLMs), particularly in the GPT series and the o1 model, have demonstrated the effectiveness of scaling both training-time and inference-time compute. However, current state-of-the-art TTS systems leveraging LLMs are often multi-stage, requiring separate models (e.g., diffusion models after LLM), complicating the decision of whether to scale a pa… ▽ More

    Submitted 22 February, 2025; v1 submitted 6 February, 2025; originally announced February 2025.

  16. arXiv:2502.03494  [pdf, other

    eess.IV

    Integral Fast Fourier Color Constancy

    Authors: Wenjun Wei, Yanlin Qian, Huaian Chen, Junkang Dai, Yi Jin

    Abstract: Traditional auto white balance (AWB) algorithms typically assume a single global illuminant source, which leads to color distortions in multi-illuminant scenes. While recent neural network-based methods have shown excellent accuracy in such scenarios, their high parameter count and computational demands limit their practicality for real-time video applications. The Fast Fourier Color Constancy (FF… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

  17. arXiv:2501.00909  [pdf, other

    cs.IT eess.SP

    RIS-Aided Integrated Sensing and Communication Systems under Dual-polarized Channels

    Authors: Dongnan Xia, Cunhua Pan, Hong Ren, Zhiyuan Yu, Yasheng Jin, Jiangzhou Wang

    Abstract: This paper considers reconfigurable intelligent surface (RIS)-aided integrated sensing and communication (ISAC) systems under dual-polarized (DP) channels. Unlike the existing ISAC systems, which ignored polarization of electromagnetic waves, this study adopts DP base station (BS) and DP RIS to serve users with a pair of DP antennas. The achievable sum rate is maximized through jointly optimiz… ▽ More

    Submitted 1 January, 2025; originally announced January 2025.

  18. arXiv:2411.18266  [pdf

    eess.AS cs.AI cs.SD eess.SY

    Wearable intelligent throat enables natural speech in stroke patients with dysarthria

    Authors: Chenyu Tang, Shuo Gao, Cong Li, Wentian Yi, Yuxuan Jin, Xiaoxue Zhai, Sixuan Lei, Hongbei Meng, Zibo Zhang, Muzi Xu, Shengbo Wang, Xuhang Chen, Chenxi Wang, Hongyun Yang, Ningli Wang, Wenyu Wang, Jin Cao, Xiaodong Feng, Peter Smielewski, Yu Pan, Wenhui Song, Martin Birchall, Luigi G. Occhipinti

    Abstract: Wearable silent speech systems hold significant potential for restoring communication in patients with speech impairments. However, seamless, coherent speech remains elusive, and clinical efficacy is still unproven. Here, we present an AI-driven intelligent throat (IT) system that integrates throat muscle vibrations and carotid pulse signal sensors with large language model (LLM) processing to ena… ▽ More

    Submitted 14 March, 2025; v1 submitted 27 November, 2024; originally announced November 2024.

    Comments: 5 figures, 45 references

  19. arXiv:2411.08902  [pdf, other

    eess.SP cs.NI

    A Range-Free Node Localization Method for Anisotropic Wireless Sensor Networks with Sparse Anchors

    Authors: Yong Jin, Junfang Leng, Lin Zhou, Yu Jiang, Qian Wei

    Abstract: In sensor networks characterized by irregular layouts and poor connectivity, anisotropic properties can significantly reduce the accuracy of distance estimation between nodes, consequently impairing the localization precision of unidentified nodes. Since distance estimation is contingent upon the multi-hop paths between anchor node pairs, assigning differential weights based on the reliability of… ▽ More

    Submitted 29 October, 2024; originally announced November 2024.

  20. arXiv:2411.08835  [pdf, other

    cs.RO eess.SY

    Goal-oriented Semantic Communication for Robot Arm Reconstruction in Digital Twin: Feature and Temporal Selections

    Authors: Shutong Chen, Emmanouil Spyrakos-Papastavridis, Yichao Jin, Yansha Deng

    Abstract: As one of the most promising technologies in industry, the Digital Twin (DT) facilitates real-time monitoring and predictive analysis for real-world systems by precisely reconstructing virtual replicas of physical entities. However, this reconstruction faces unprecedented challenges due to the everincreasing communication overhead, especially for digital robot arm reconstruction. To this end, we p… ▽ More

    Submitted 13 November, 2024; originally announced November 2024.

    Comments: Submitted to IEEE for potential publication

  21. arXiv:2411.08014  [pdf

    cs.CV eess.IV

    Artistic Neural Style Transfer Algorithms with Activation Smoothing

    Authors: Xiangtian Li, Han Cao, Zhaoyang Zhang, Jiacheng Hu, Yuhui Jin, Zihao Zhao

    Abstract: The works of Gatys et al. demonstrated the capability of Convolutional Neural Networks (CNNs) in creating artistic style images. This process of transferring content images in different styles is called Neural Style Transfer (NST). In this paper, we re-implement image-based NST, fast NST, and arbitrary NST. We also explore to utilize ResNet with activation smoothing in NST. Extensive experimental… ▽ More

    Submitted 12 November, 2024; originally announced November 2024.

    Comments: 8 pages,7 figures

  22. arXiv:2411.01575  [pdf, other

    eess.IV cs.CV

    HC$^3$L-Diff: Hybrid conditional latent diffusion with high frequency enhancement for CBCT-to-CT synthesis

    Authors: Shi Yin, Hongqi Tan, Li Ming Chong, Haofeng Liu, Hui Liu, Kang Hao Lee, Jeffrey Kit Loong Tuan, Dean Ho, Yueming Jin

    Abstract: Background: Cone-beam computed tomography (CBCT) plays a crucial role in image-guided radiotherapy, but artifacts and noise make them unsuitable for accurate dose calculation. Artificial intelligence methods have shown promise in enhancing CBCT quality to produce synthetic CT (sCT) images. However, existing methods either produce images of suboptimal quality or incur excessive time costs, failing… ▽ More

    Submitted 3 November, 2024; originally announced November 2024.

    Comments: 13 pages, 5 figures

  23. arXiv:2411.01019  [pdf, other

    eess.IV cs.CV

    A lightweight Convolutional Neural Network based on U shape structure and Attention Mechanism for Anterior Mediastinum Segmentation

    Authors: Sina Soleimani-Fard, Won Gi Jeong, Francis Ferri Ripalda, Hasti Sasani, Younhee Choi, S Deiva, Gong Yong Jin, Seok-bum Ko

    Abstract: To automatically detect Anterior Mediastinum Lesions (AMLs) in the Anterior Mediastinum (AM), the primary requirement will be an automatic segmentation model specifically designed for the AM. The prevalence of AML is extremely low, making it challenging to conduct screening research similar to lung cancer screening. Retrospectively reviewing chest CT scans over a specific period to investigate the… ▽ More

    Submitted 1 November, 2024; originally announced November 2024.

  24. arXiv:2410.21351  [pdf, other

    cs.LG cs.AI cs.NI eess.SP

    LinFormer: A Linear-based Lightweight Transformer Architecture For Time-Aware MIMO Channel Prediction

    Authors: Yanliang Jin, Yifan Wu, Yuan Gao, Shunqing Zhang, Shugong Xu, Cheng-Xiang Wang

    Abstract: The emergence of 6th generation (6G) mobile networks brings new challenges in supporting high-mobility communications, particularly in addressing the issue of channel aging. While existing channel prediction methods offer improved accuracy at the expense of increased computational complexity, limiting their practical application in mobile networks. To address these challenges, we present LinFormer… ▽ More

    Submitted 28 October, 2024; originally announced October 2024.

  25. arXiv:2410.21276  [pdf, other

    cs.CL cs.AI cs.CV cs.CY cs.LG cs.SD eess.AS

    GPT-4o System Card

    Authors: OpenAI, :, Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, Aleksander Mądry, Alex Baker-Whitcomb, Alex Beutel, Alex Borzunov, Alex Carney, Alex Chow, Alex Kirillov, Alex Nichol, Alex Paino, Alex Renzin, Alex Tachard Passos, Alexander Kirillov, Alexi Christakis , et al. (395 additional authors not shown)

    Abstract: GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 mil… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

  26. arXiv:2410.06542  [pdf, other

    eess.IV cs.CV

    MedImageInsight: An Open-Source Embedding Model for General Domain Medical Imaging

    Authors: Noel C. F. Codella, Ying Jin, Shrey Jain, Yu Gu, Ho Hin Lee, Asma Ben Abacha, Alberto Santamaria-Pang, Will Guyman, Naiteek Sangani, Sheng Zhang, Hoifung Poon, Stephanie Hyland, Shruthi Bannur, Javier Alvarez-Valle, Xue Li, John Garrett, Alan McMillan, Gaurav Rajguru, Madhu Maddi, Nilesh Vijayrania, Rehaan Bhimai, Nick Mecklenburg, Rupal Jain, Daniel Holstein, Naveen Gaur , et al. (6 additional authors not shown)

    Abstract: In this work, we present MedImageInsight, an open-source medical imaging embedding model. MedImageInsight is trained on medical images with associated text and labels across a diverse collection of domains, including X-Ray, CT, MRI, dermoscopy, OCT, fundus photography, ultrasound, histopathology, and mammography. Rigorous evaluations demonstrate MedImageInsight's ability to achieve state-of-the-ar… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

  27. arXiv:2409.09052  [pdf, other

    eess.IV cs.AI cs.CV

    OrthoDoc: Multimodal Large Language Model for Assisting Diagnosis in Computed Tomography

    Authors: Youzhu Jin, Yichen Zhang

    Abstract: Multimodal large language models (MLLMs) have achieved significant success in the general field of image processing. Their emerging task generalization and freeform conversational capabilities can greatly facilitate medical diagnostic assistance, helping patients better understand their conditions and enhancing doctor-patient trust. Computed Tomography (CT) is a non-invasive imaging technique used… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.

    Comments: 8 pages, 1 figure

  28. arXiv:2408.07931  [pdf, other

    cs.CV cs.AI cs.RO eess.IV

    Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning

    Authors: Haofeng Liu, Erli Zhang, Junde Wu, Mingxuan Hong, Yueming Jin

    Abstract: Surgical video segmentation is a critical task in computer-assisted surgery and is vital for enhancing surgical quality and patient outcomes. Recently, the Segment Anything Model 2 (SAM2) framework has shown superior advancements in image and video segmentation. However, SAM2 struggles with efficiency due to the high computational demands of processing high-resolution images and complex and long-r… ▽ More

    Submitted 11 March, 2025; v1 submitted 15 August, 2024; originally announced August 2024.

    Comments: Accepted by NeurIPS 2024 Workshop AIM-FM

  29. arXiv:2408.04535  [pdf, other

    eess.IV cs.AI

    Synchronous Multi-modal Semantic Communication System with Packet-level Coding

    Authors: Yun Tian, Jingkai Ying, Zhijin Qin, Ye Jin, Xiaoming Tao

    Abstract: Although the semantic communication with joint semantic-channel coding design has shown promising performance in transmitting data of different modalities over physical layer channels, the synchronization and packet-level forward error correction of multimodal semantics have not been well studied. Due to the independent design of semantic encoders, synchronizing multimodal features in both the sem… ▽ More

    Submitted 10 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

    Comments: 12 pages, 9 figures

  30. arXiv:2408.00753  [pdf

    eess.SP cs.AI

    A deep learning-enabled smart garment for accurate and versatile sleep conditions monitoring in daily life

    Authors: Chenyu Tang, Wentian Yi, Muzi Xu, Yuxuan Jin, Zibo Zhang, Xuhang Chen, Caizhi Liao, Peter Smielewski, Luigi G. Occhipinti

    Abstract: In wearable smart systems, continuous monitoring and accurate classification of different sleep-related conditions are critical for enhancing sleep quality and preventing sleep-related chronic conditions. However, the requirements for device-skin coupling quality in electrophysiological sleep monitoring systems hinder the comfort and reliability of night wearing. Here, we report a washable, skin-c… ▽ More

    Submitted 3 October, 2024; v1 submitted 1 August, 2024; originally announced August 2024.

    Comments: 20 pages, 5 figures, 1 table

  31. arXiv:2407.13092  [pdf, other

    eess.IV cs.CV

    CC-DCNet: Dynamic Convolutional Neural Network with Contrastive Constraints for Identifying Lung Cancer Subtypes on Multi-modality Images

    Authors: Yuan Jin, Gege Ma, Geng Chen, Tianling Lyu, Jan Egger, Junhui Lyu, Shaoting Zhang, Wentao Zhu

    Abstract: The accurate diagnosis of pathological subtypes of lung cancer is of paramount importance for follow-up treatments and prognosis managements. Assessment methods utilizing deep learning technologies have introduced novel approaches for clinical diagnosis. However, the majority of existing models rely solely on single-modality image input, leading to limited diagnostic accuracy. To this end, we prop… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  32. arXiv:2407.08230  [pdf, other

    eess.SP

    Handling Distance Constraint in Movable Antenna Aided Systems: A General Optimization Framework

    Authors: Yichen Jin, Qingfeng Lin, Yang Li, Yik-Chung Wu

    Abstract: The movable antenna (MA) is a promising technology to exploit more spatial degrees of freedom for enhancing wireless system performance. However, the MA-aided system introduces the non-convex antenna distance constraints, which poses challenges in the underlying optimization problems. To fill this gap, this paper proposes a general framework for optimizing the MA-aided system under the antenna dis… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  33. arXiv:2407.07728   

    cs.SD cs.AI cs.MM eess.AS

    SaMoye: Zero-shot Singing Voice Conversion Model Based on Feature Disentanglement and Enhancement

    Authors: Zihao Wang, Le Ma, Yongsheng Feng, Xin Pan, Yuhang Jin, Kejun Zhang

    Abstract: Singing voice conversion (SVC) aims to convert a singer's voice to another singer's from a reference audio while keeping the original semantics. However, existing SVC methods can hardly perform zero-shot due to incomplete feature disentanglement or dependence on the speaker look-up table. We propose the first open-source high-quality zero-shot SVC model SaMoye that can convert singing to human and… ▽ More

    Submitted 15 November, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: This paper needs major changes for resubmit

    MSC Class: 68Txx(Primary)14F05; 91Fxx(Secondary) ACM Class: I.2.7; J.5

  34. arXiv:2406.14534  [pdf, other

    eess.IV cs.CV

    Epicardium Prompt-guided Real-time Cardiac Ultrasound Frame-to-volume Registration

    Authors: Long Lei, Jun Zhou, Jialun Pei, Baoliang Zhao, Yueming Jin, Yuen-Chun Jeremy Teoh, Jing Qin, Pheng-Ann Heng

    Abstract: A comprehensive guidance view for cardiac interventional surgery can be provided by the real-time fusion of the intraoperative 2D images and preoperative 3D volume based on the ultrasound frame-to-volume registration. However, cardiac ultrasound images are characterized by a low signal-to-noise ratio and small differences between adjacent frames, coupled with significant dimension variations betwe… ▽ More

    Submitted 17 January, 2025; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted by MICCAI 2024

  35. arXiv:2406.11519  [pdf, other

    cs.CV eess.IV

    HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

    Authors: Di Wang, Meiqi Hu, Yao Jin, Yuchun Miao, Jiaqi Yang, Yichu Xu, Xiaolei Qin, Jiaqi Ma, Lingyu Sun, Chenxing Li, Chuan Fu, Hongruixuan Chen, Chengxi Han, Naoto Yokoya, Jing Zhang, Minqiang Xu, Lin Liu, Lefei Zhang, Chen Wu, Bo Du, Dacheng Tao, Liangpei Zhang

    Abstract: Accurate hyperspectral image (HSI) interpretation is critical for providing valuable insights into various earth observation-related applications such as urban planning, precision agriculture, and environmental monitoring. However, existing HSI processing methods are predominantly task-specific and scene-dependent, which severely limits their ability to transfer knowledge across tasks and scenes,… ▽ More

    Submitted 1 April, 2025; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE TPAMI. Project website: https://whu-sigma.github.io/HyperSIGMA

  36. arXiv:2406.04680  [pdf, other

    eess.IV cs.CV

    MTS-Net: Dual-Enhanced Positional Multi-Head Self-Attention for 3D CT Diagnosis of May-Thurner Syndrome

    Authors: Yixin Huang, Yiqi Jin, Ke Tao, Kaijian Xia, Jianfeng Gu, Lei Yu, Lan Du, Cunjian Chen

    Abstract: May-Thurner Syndrome (MTS), also known as iliac vein compression syndrome or Cockett's syndrome, is a condition potentially impacting over 20 percent of the population, leading to an increased risk of iliofemoral deep venous thrombosis. In this paper, we present a 3D-based deep learning approach called MTS-Net for diagnosing May-Thurner Syndrome using CT scans. To effectively capture the spatial-t… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  37. arXiv:2405.17270  [pdf, other

    eess.SP

    Towards Accurate Ego-lane Identification with Early Time Series Classification

    Authors: Yuchuan Jin, Theodor Stenhammar, David Bejmer, Axel Beauvisage, Yuxuan Xia, Junsheng Fu

    Abstract: Accurate and timely determination of a vehicle's current lane within a map is a critical task in autonomous driving systems. This paper utilizes an Early Time Series Classification (ETSC) method to achieve precise and rapid ego-lane identification in real-world driving data. The method begins by assessing the similarities between map and lane markings perceived by the vehicle's camera using measur… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 8 pages, 5 figures

  38. arXiv:2405.10825  [pdf, other

    eess.SY cs.LG

    Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities

    Authors: Hao Zhou, Chengming Hu, Ye Yuan, Yufei Cui, Yili Jin, Can Chen, Haolun Wu, Dun Yuan, Li Jiang, Di Wu, Xue Liu, Charlie Zhang, Xianbin Wang, Jiangchuan Liu

    Abstract: Large language models (LLMs) have received considerable attention recently due to their outstanding comprehension and reasoning capabilities, leading to great progress in many fields. The advancement of LLM techniques also offers promising opportunities to automate many tasks in the telecommunication (telecom) field. After pre-training and fine-tuning, LLMs can perform diverse downstream tasks bas… ▽ More

    Submitted 16 September, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

  39. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi Jin, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  40. arXiv:2404.15284  [pdf, other

    eess.SP cs.AI

    Global 4D Ionospheric STEC Prediction based on DeepONet for GNSS Rays

    Authors: Dijia Cai, Zenghui Shi, Haiyang Fu, Huan Liu, Hongyi Qian, Yun Sui, Feng Xu, Ya-Qiu Jin

    Abstract: The ionosphere is a vitally dynamic charged particle region in the Earth's upper atmosphere, playing a crucial role in applications such as radio communication and satellite navigation. The Slant Total Electron Contents (STEC) is an important parameter for characterizing wave propagation, representing the integrated electron density along the ray of radio signals passing through the ionosphere. Th… ▽ More

    Submitted 12 March, 2024; originally announced April 2024.

  41. arXiv:2403.10931  [pdf, other

    eess.IV cs.CV

    Uncertainty-Aware Adapter: Adapting Segment Anything Model (SAM) for Ambiguous Medical Image Segmentation

    Authors: Mingzhou Jiang, Jiaying Zhou, Junde Wu, Tianyang Wang, Yueming Jin, Min Xu

    Abstract: The Segment Anything Model (SAM) gained significant success in natural image segmentation, and many methods have tried to fine-tune it to medical image segmentation. An efficient way to do so is by using Adapters, specialized modules that learn just a few parameters to tailor SAM specifically for medical images. However, unlike natural images, many tissues and lesions in medical images have blurry… ▽ More

    Submitted 18 March, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

  42. arXiv:2403.07317  [pdf, other

    eess.SY

    GMPC: Geometric Model Predictive Control for Wheeled Mobile Robot Trajectory Tracking

    Authors: Jiawei Tang, Shuang Wu, Bo Lan, Yahui Dong, Yuqiang Jin, Guangjian Tian, Wen-An Zhang, Ling Shi

    Abstract: The configuration of most robotic systems lies in continuous transformation groups. However, in mobile robot trajectory tracking, many recent works still naively utilize optimization methods for elements in vector space without considering the manifold constraint of the robot configuration. In this letter, we propose a geometric model predictive control (MPC) framework for wheeled mobile robot tra… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  43. arXiv:2402.19387  [pdf, other

    eess.IV cs.CV

    SeD: Semantic-Aware Discriminator for Image Super-Resolution

    Authors: Bingchen Li, Xin Li, Hanxin Zhu, Yeying Jin, Ruoyu Feng, Zhizheng Zhang, Zhibo Chen

    Abstract: Generative Adversarial Networks (GANs) have been widely used to recover vivid textures in image super-resolution (SR) tasks. In particular, one discriminator is utilized to enable the SR network to learn the distribution of real-world high-quality images in an adversarial training manner. However, the distribution learning is overly coarse-grained, which is susceptible to virtual textures and caus… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: CVPR2024

  44. arXiv:2402.11423  [pdf, other

    cs.CR eess.SP

    VoltSchemer: Use Voltage Noise to Manipulate Your Wireless Charger

    Authors: Zihao Zhan, Yirui Yang, Haoqi Shan, Hanqiu Wang, Yier Jin, Shuo Wang

    Abstract: Wireless charging is becoming an increasingly popular charging solution in portable electronic products for a more convenient and safer charging experience than conventional wired charging. However, our research identified new vulnerabilities in wireless charging systems, making them susceptible to intentional electromagnetic interference. These vulnerabilities facilitate a set of novel attack vec… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

    Comments: This paper has been accepted by the 33rd USENIX Security Symposium

  45. arXiv:2402.09421  [pdf, other

    eess.SP cs.LG

    EEG Based Generative Depression Discriminator

    Authors: Ziming Mao, Hao wu, Yongxi Tan, Yuhe Jin

    Abstract: Depression is a very common but serious mood disorder.In this paper, We built a generative detection network(GDN) in accordance with three physiological laws. Our aim is that we expect the neural network to learn the relevant brain activity based on the EEG signal and, at the same time, to regenerate the target electrode signal based on the brain activity. We trained two generators, the first one… ▽ More

    Submitted 19 January, 2024; originally announced February 2024.

  46. arXiv:2402.05847  [pdf, other

    eess.SP

    Reconfigurable Intelligent Surface-Aided Dual-Function Radar and Communication Systems With MU-MIMO Communication

    Authors: Yasheng Jin, Hong Ren, Cunhua Pan, Zhiyuan Yu, Ruisong Weng, Boshi Wang, Gui Zhou, Yongchao He, Maged Elkashlan

    Abstract: In this paper, we investigate an reconfigurable intelligent surface (RIS)-aided integrated sensing and communication (ISAC) system. Our objective is to maximize the achievable sum rate of the multi-antenna communication users through the joint active and passive beamforming. {Specifically}, the weighted minimum mean-square error (WMMSE) method is { first} used to reformulate the original problem i… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  47. arXiv:2401.14829  [pdf, other

    cs.NI cs.SE eess.SY

    UMBRELLA: A One-stop Shop Bridging the Gap from Lab to Real-World IoT Experimentation

    Authors: Ioannis Mavromatis, Yichao Jin, Aleksandar Stanoev, Anthony Portelli, Ingram Weeks, Ben Holden, Eliot Glasspole, Tim Farnham, Aftab Khan, Usman Raza, Adnan Aijaz, Thomas Bierton, Ichiro Seto, Nita Patel, Mahesh Sooriyabandara

    Abstract: UMBRELLA is an open, large-scale IoT ecosystem deployed across South Gloucestershire, UK. It is intended to accelerate innovation across multiple technology domains. UMBRELLA is built to bridge the gap between existing specialised testbeds and address holistically real-world technological challenges in a System-of-Systems (SoS) fashion. UMBRELLA provides open access to real-world devices and infra… ▽ More

    Submitted 2 February, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

    Comments: Submitted for publication to IEEE Access

  48. arXiv:2312.07981  [pdf

    cs.LG cs.SD eess.SP

    Time Series Diffusion Method: A Denoising Diffusion Probabilistic Model for Vibration Signal Generation

    Authors: Haiming Yi, Lei Hou, Yuhong Jin, Nasser A. Saeed, Ali Kandil, Hao Duan

    Abstract: Diffusion models have demonstrated powerful data generation capabilities in various research fields such as image generation. However, in the field of vibration signal generation, the criteria for evaluating the quality of the generated signal are different from that of image generation and there is a fundamental difference between them. At present, there is no research on the ability of diffusion… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Journal ref: Mechanical Systems and Signal Processing, 2024, 216: 111481

  49. arXiv:2312.07826  [pdf

    cs.RO eess.SY

    Integrated Path Tracking with DYC and MPC using LSTM Based Tire Force Estimator for Four-wheel Independent Steering and Driving Vehicle

    Authors: Sungjin Lim, Bilal Sadiq, Yongsik Jin, Sangho Lee, Gyeungho Choi, Kanghyun Nam, Yongseob Lim

    Abstract: Active collision avoidance system plays a crucial role in ensuring the lateral safety of autonomous vehicles, and it is primarily related to path planning and tracking control algorithms. In particular, the direct yaw-moment control (DYC) system can significantly improve the lateral stability of a vehicle in environments with sudden changes in road conditions. In order to apply the DYC algorithm,… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  50. arXiv:2310.19756  [pdf

    eess.SP

    Transmission line condition prediction based on semi-supervised learning

    Authors: Sizhe Li, Xun Ma, Nan Liu, Yi Jin

    Abstract: Transmission line state assessment and prediction are of great significance for the rational formulation of operation and maintenance strategy and improvement of operation and maintenance level. Aiming at the problem that existing models cannot take into account the robustness and data demand, this paper proposes a state prediction method based on semi-supervised learning. Firstly, for the expande… ▽ More

    Submitted 6 December, 2023; v1 submitted 30 October, 2023; originally announced October 2023.