Skip to main content

Showing 1–30 of 30 results for author: Fan, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.19885  [pdf, ps, other

    cs.LG cs.AI eess.SY

    FlightKooba: A Fast Interpretable FTP Model

    Authors: Jing Lu, Xuan Wu, Yizhun Tian, Songhan Fan, Yali Fang

    Abstract: The Koopman theory is a powerful and effective modeling tool for converting nonlinear systems into linear representations, and flight trajectory prediction (FTP) is a complex nonlinear system. However, current models applying the Koopman theory to FTP tasks are not very effective, model interpretability is indeed an issue, and the Koopman operators are computationally intensive, resulting in long… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: 7 figures

  2. arXiv:2506.06400  [pdf, ps, other

    eess.IV cs.CV

    ResPF: Residual Poisson Flow for Efficient and Physically Consistent Sparse-View CT Reconstruction

    Authors: Changsheng Fang, Yongtong Liu, Bahareh Morovati, Shuo Han, Yu Shi, Li Zhou, Shuyi Fan, Hengyong Yu

    Abstract: Sparse-view computed tomography (CT) is a practical solution to reduce radiation dose, but the resulting ill-posed inverse problem poses significant challenges for accurate image reconstruction. Although deep learning and diffusion-based methods have shown promising results, they often lack physical interpretability or suffer from high computational costs due to iterative sampling starting from ra… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  3. arXiv:2502.18008  [pdf, other

    cs.SD cs.AI eess.AS

    NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms

    Authors: Yashan Wang, Shangda Wu, Jianhuai Hu, Xingjian Du, Yueqi Peng, Yongxin Huang, Shuai Fan, Xiaobing Li, Feng Yu, Maosong Sun

    Abstract: We introduce NotaGen, a symbolic music generation model aiming to explore the potential of producing high-quality classical sheet music. Inspired by the success of Large Language Models (LLMs), NotaGen adopts pre-training, fine-tuning, and reinforcement learning paradigms (henceforth referred to as the LLM training paradigms). It is pre-trained on 1.6M pieces of music in ABC notation, and then fin… ▽ More

    Submitted 21 March, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

  4. arXiv:2502.17499  [pdf

    eess.SP cs.AI cs.LG math.NA

    Detecting Long QT Syndrome and First-Degree Atrioventricular Block using Single-Lead AI-ECG: A Multi-Center Real-World Study

    Authors: Sumei Fan, Deyun Zhang, Yue Wang, Shijia Geng, Kun Lu, Meng Sang, Weilun Xu, Haixue Wang, Qinghao Zhao, Chuandong Cheng, Peng Wang, Shenda Hong

    Abstract: Home-based single-lead AI-ECG devices have enabled continuous, real-world cardiac monitoring. However, the accuracy of parameter calculations from single-lead AI-ECG algorithm remains to be fully validated, which is critical for conditions such as Long QT Syndrome (LQTS) and First-Degree Atrioventricular Block (AVBI). In this multicenter study, we assessed FeatureDB, an ECG measurements computatio… ▽ More

    Submitted 26 April, 2025; v1 submitted 21 February, 2025; originally announced February 2025.

    Comments: 29pages, 11 figures, 8 tables

  5. arXiv:2502.16584  [pdf, other

    cs.SD cs.AI cs.CL cs.MM eess.AS

    Audio-FLAN: A Preliminary Release

    Authors: Liumeng Xue, Ziya Zhou, Jiahao Pan, Zixuan Li, Shuai Fan, Yinghao Ma, Sitong Cheng, Dongchao Yang, Haohan Guo, Yujia Xiao, Xinsheng Wang, Zixuan Shen, Chuanbo Zhu, Xinshen Zhang, Tianchi Liu, Ruibin Yuan, Zeyue Tian, Haohe Liu, Emmanouil Benetos, Ge Zhang, Yike Guo, Wei Xue

    Abstract: Recent advancements in audio tokenization have significantly enhanced the integration of audio capabilities into large language models (LLMs). However, audio understanding and generation are often treated as distinct tasks, hindering the development of truly unified audio-language models. While instruction tuning has demonstrated remarkable success in improving generalization and zero-shot learnin… ▽ More

    Submitted 23 February, 2025; originally announced February 2025.

  6. arXiv:2501.08868  [pdf, other

    eess.SY cs.HC

    Processing and Analyzing Real-World Driving Data: Insights on Trips, Scenarios, and Human Driving Behaviors

    Authors: Jihun Han, Dominik Karbowski, Ayman Moawad, Namdoo Kim, Aymeric Rousseau, Shihong Fan, Jason Hoon Lee, Jinho Ha

    Abstract: Analyzing large volumes of real-world driving data is essential for providing meaningful and reliable insights into real-world trips, scenarios, and human driving behaviors. To this end, we developed a multi-level data processing approach that adds new information, segments data, and extracts desired parameters. Leveraging a confidential but extensive dataset (over 1 million km), this approach lea… ▽ More

    Submitted 15 January, 2025; originally announced January 2025.

  7. arXiv:2501.06115  [pdf

    cs.RO eess.SY

    Development of an Advisory System for Parking of a Car and Trailer

    Authors: Xincheng Cao, Haochong Chen, Bilin Aksun Guvenc, Levent Guvenc, Shihong Fan, John Harber, Brian Link, Peter Richmond, Dokyung Yim

    Abstract: Trailer parking is a challenging task due to the unstable nature of the vehicle-trailer system in reverse motion and the unintuitive steering actions required at the vehicle to accomplish the parking maneuver. This paper presents a strategy to tackle this kind of maneuver with an advisory graphic aid to help the human driver with the task of manually backing up the vehicle-trailer system. A kinema… ▽ More

    Submitted 10 January, 2025; originally announced January 2025.

  8. arXiv:2412.19078  [pdf, other

    eess.AS eess.SP

    Graph-Enhanced Dual-Stream Feature Fusion with Pre-Trained Model for Acoustic Traffic Monitoring

    Authors: Shitong Fan, Feiyang Xiao, Wenbo Wang, Shuhan Qi, Qiaoxi Zhu, Wenwu Wang, Jian Guan

    Abstract: Microphone array techniques are widely used in sound source localization and smart city acoustic-based traffic monitoring, but these applications face significant challenges due to the scarcity of labeled real-world traffic audio data and the complexity and diversity of application scenarios. The DCASE Challenge's Task 10 focuses on using multi-channel audio signals to count vehicles (cars or comm… ▽ More

    Submitted 26 December, 2024; originally announced December 2024.

    Comments: Shitong Fan and Feiyang Xiao contributed equally. Accepted by the IEEE International Conference on Acoustics, Speech, and Signal Processing(ICASSP)2025

  9. arXiv:2411.13298   

    eess.SP

    A CSI Feedback Framework based on Transmitting the Important Values and Generating the Others

    Authors: Zhilin Du, Zhenyu Liu, Haozhen Li, Shilong Fan, Xinyu Gu, Lin Zhang

    Abstract: The application of deep learning (DL)-based channel state information (CSI) feedback frameworks in massive multiple-input multiple-output (MIMO) systems has significantly improved reconstruction accuracy. However, the limited generalization of widely adopted autoencoder-based networks for CSI feedback challenges consistent performance under dynamic wireless channel conditions and varying communica… ▽ More

    Submitted 28 November, 2024; v1 submitted 20 November, 2024; originally announced November 2024.

    Comments: I have to make some modification on the test dataset and constrast methods in the experimental results segment

  10. arXiv:2410.15078  [pdf, other

    eess.AS eess.SP

    Independent Feature Enhanced Crossmodal Fusion for Match-Mismatch Classification of Speech Stimulus and EEG Response

    Authors: Shitong Fan, Wenbo Wang, Feiyang Xiao, Shiheng Zhang, Qiaoxi Zhu, Jian Guan

    Abstract: It is crucial for auditory attention decoding to classify matched and mismatched speech stimuli with corresponding EEG responses by exploring their relationship. However, existing methods often adopt two independent networks to encode speech stimulus and EEG response, which neglect the relationship between these signals from the two modalities. In this paper, we propose an independent feature enha… ▽ More

    Submitted 19 October, 2024; originally announced October 2024.

    Comments: Shitong Fan and Wenbo Wang contributed equally. Accepted by the International Symposium on Chinese Spoken Language Processing (ISCSLP) 2024

  11. arXiv:2410.13992  [pdf

    eess.SY

    Resilience-Oriented DG Siting and Sizing Considering Energy Equity Constraint

    Authors: Chenchen Li, Fangxing Li, Sufan Jiang, Jin Zhao, Shiyuan Fan, Leon M. Tolbert

    Abstract: Extreme weather events can cause widespread power outages and huge economic losses. Low-income customers are more vulnerable to power outages because they live in areas with poorly equipped distribution systems. However, existing approaches to improve grid resilience focus on the overall condition of the system and ignore the outage experiences of low-income customers, which leads to significant e… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  12. A novel pedestrian road crossing simulator for dynamic traffic light scheduling systems

    Authors: Dayuan Tan, Mohamed Younis, Wassila Lalouani, Shuyao Fan, Guozhi Song

    Abstract: The major advances in intelligent transportation systems are pushing societal services toward autonomy where road management is to be more agile in order to cope with changes and continue to yield optimal performance. However, the pedestrian experience is not sufficiently considered. Particularly, signalized intersections are expected to be popular if not dominant in urban settings where pedestria… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

    Journal ref: Journal of Intelligent Transportation Systems 28.5 (2024): 636-650

  13. arXiv:2407.14904  [pdf, other

    eess.IV cs.AI cs.CL cs.CV

    Large-vocabulary forensic pathological analyses via prototypical cross-modal contrastive learning

    Authors: Chen Shen, Chunfeng Lian, Wanqing Zhang, Fan Wang, Jianhua Zhang, Shuanliang Fan, Xin Wei, Gongji Wang, Kehan Li, Hongshu Mu, Hao Wu, Xinggong Liang, Jianhua Ma, Zhenyuan Wang

    Abstract: Forensic pathology is critical in determining the cause and manner of death through post-mortem examinations, both macroscopic and microscopic. The field, however, grapples with issues such as outcome variability, laborious processes, and a scarcity of trained professionals. This paper presents SongCi, an innovative visual-language model (VLM) designed specifically for forensic pathology. SongCi u… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: 28 pages, 6 figures, under review

  14. arXiv:2406.11546  [pdf, other

    eess.AS cs.CL cs.SD

    GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement

    Authors: Yifan Yang, Zheshu Song, Jianheng Zhuo, Mingyu Cui, Jinpeng Li, Bo Yang, Yexing Du, Ziyang Ma, Xunying Liu, Ziyuan Wang, Ke Li, Shuai Fan, Kai Yu, Wei-Qiang Zhang, Guoguo Chen, Xie Chen

    Abstract: The evolution of speech technology has been spurred by the rapid increase in dataset sizes. Traditional speech models generally depend on a large amount of labeled training data, which is scarce for low-resource languages. This paper presents GigaSpeech 2, a large-scale, multi-domain, multilingual speech recognition corpus. It is designed for low-resource languages and does not rely on paired spee… ▽ More

    Submitted 27 May, 2025; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted in ACL 2025 (Main)

  15. arXiv:2405.19665  [pdf

    eess.SY cs.AI cs.LG

    A novel fault localization with data refinement for hydroelectric units

    Authors: Jialong Huang, Junlin Song, Penglong Lian, Mengjie Gan, Zhiheng Su, Benhao Wang, Wenji Zhu, Xiaomin Pu, Jianxiao Zou, Shicai Fan

    Abstract: Due to the scarcity of fault samples and the complexity of non-linear and non-smooth characteristics data in hydroelectric units, most of the traditional hydroelectric unit fault localization methods are difficult to carry out accurate localization. To address these problems, a sparse autoencoder (SAE)-generative adversarial network (GAN)-wavelet noise reduction (WNR)- manifold-boosted deep learni… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 6pages,4 figures,Conference on Decision and Control(CDC) conference

  16. arXiv:2404.15339  [pdf, other

    eess.IV

    Efficient EndoNeRF Reconstruction and Its Application for Data-driven Surgical Simulation

    Authors: Yuehao Wang, Bingchen Gong, Yonghao Long, Siu Hin Fan, Qi Dou

    Abstract: The healthcare industry has a growing need for realistic modeling and efficient simulation of surgical scenes. With effective models of deformable surgical scenes, clinicians are able to conduct surgical planning and surgery training on scenarios close to real-world cases. However, a significant challenge in achieving such a goal is the scarcity of high-quality soft tissue models with accurate sha… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 14 pages, 4 figures. Accepted by International Journal of Computer Assisted Radiology and Surgery

  17. arXiv:2404.06079  [pdf, other

    eess.AS cs.AI

    The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge

    Authors: Yiwei Guo, Chenrun Wang, Yifan Yang, Hankun Wang, Ziyang Ma, Chenpeng Du, Shuai Wang, Hanzheng Li, Shuai Fan, Hui Zhang, Xie Chen, Kai Yu

    Abstract: Discrete speech tokens have been more and more popular in multiple speech processing fields, including automatic speech recognition (ASR), text-to-speech (TTS) and singing voice synthesis (SVS). In this paper, we describe the systems developed by the SJTU X-LANCE group for the TTS (acoustic + vocoder), SVS, and ASR tracks in the Interspeech 2024 Speech Processing Using Discrete Speech Unit Challen… ▽ More

    Submitted 9 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: 5 pages, 3 figures. Report of a challenge

  18. arXiv:2403.19185  [pdf, other

    cs.IT eess.SP

    Deep CSI Compression for Dual-Polarized Massive MIMO Channels with Disentangled Representation Learning

    Authors: Suhang Fan, Wei Xu, Renjie Xie, Shi Jin, Derrick Wing Kwan Ng, Naofal Al-Dhahir

    Abstract: Channel state information (CSI) feedback is critical for achieving the promised advantages of enhancing spectral and energy efficiencies in massive multiple-input multiple-output (MIMO) wireless communication systems. Deep learning (DL)-based methods have been proven effective in reducing the required signaling overhead for CSI feedback. In practical dual-polarized MIMO scenarios, channels in the… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  19. arXiv:2401.08926  [pdf, ps, other

    cs.CV eess.IV

    Stochasticity-aware No-Reference Point Cloud Quality Assessment

    Authors: Songlin Fan, Wei Gao, Zhineng Chen, Ge Li, Guoqing Liu, Qicheng Wang

    Abstract: The evolution of point cloud processing algorithms necessitates an accurate assessment for their quality. Previous works consistently regard point cloud quality assessment (PCQA) as a MOS regression problem and devise a deterministic mapping, ignoring the stochasticity in generating MOS from subjective tests. This work presents the first probabilistic architecture for no-reference PCQA, motivated… ▽ More

    Submitted 15 June, 2025; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted to IJCAI 2025

  20. VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence

    Authors: Jianing Qiu, Jian Wu, Hao Wei, Peilun Shi, Minqing Zhang, Yunyun Sun, Lin Li, Hanruo Liu, Hongyi Liu, Simeng Hou, Yuyang Zhao, Xuehui Shi, Junfang Xian, Xiaoxia Qu, Sirui Zhu, Lijie Pan, Xiaoniao Chen, Xiaojia Zhang, Shuai Jiang, Kebing Wang, Chenlong Yang, Mingqiang Chen, Sujie Fan, Jianhua Hu, Aiguo Lv , et al. (17 additional authors not shown)

    Abstract: We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassifi… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

    Journal ref: The latest VisionFM work has been published in NEJM AI, 2024

  21. arXiv:2309.15529  [pdf

    eess.IV cs.CV cs.LG

    Missing-modality Enabled Multi-modal Fusion Architecture for Medical Data

    Authors: Muyu Wang, Shiyu Fan, Yichen Li, Hui Chen

    Abstract: Fusing multi-modal data can improve the performance of deep learning models. However, missing modalities are common for medical data due to patients' specificity, which is detrimental to the performance of multi-modal models in applications. Therefore, it is critical to adapt the models to missing modalities. This study aimed to develop an efficient multi-modal fusion architecture for medical data… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  22. arXiv:2305.03250  [pdf, other

    physics.optics eess.SP

    Experimentally Realizing Convolution Processing in the Photonic Synthetic Frequency Dimension

    Authors: Lingling Fan, Kai Wang, Heming Wang, Avik Dutt, Shanhui Fan

    Abstract: Convolution is an essential operation in signal and image processing and consumes most of the computing power in convolutional neural networks. Photonic convolution has the promise of addressing computational bottlenecks and outperforming electronic implementations. Performing photonic convolution in the synthetic frequency dimension, which harnesses the dynamics of light in the spectral degrees o… ▽ More

    Submitted 11 August, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

    Comments: Science Advances, in press

  23. arXiv:2301.03331  [pdf, other

    cs.CV cs.AI eess.IV

    A Specific Task-oriented Semantic Image Communication System for substation patrol inspection

    Authors: Senran Fan, Haotai Liang, Chen Dong, Xiaodong Xu, Geng Liu

    Abstract: Intelligent inspection robots are widely used in substation patrol inspection, which can help check potential safety hazards by patrolling the substation and sending back scene images. However, when patrolling some marginal areas with weak signal, the scene images cannot be sucessfully transmissted to be used for hidden danger elimination, which greatly reduces the quality of robots'daily work. To… ▽ More

    Submitted 13 April, 2024; v1 submitted 9 January, 2023; originally announced January 2023.

    Comments: 9 pages, 8 figures

    Journal ref: IEEE Transactions on Power Delivery; vol. 39; no. 2; pp. 835-844; April 2024

  24. arXiv:2210.16935  [pdf, other

    physics.optics cs.ET eess.SP

    Scalable and self-correcting photonic computation using balanced photonic binary tree cascades

    Authors: Sunil Pai, Olav Solgaard, Shanhui Fan, David A. B. Miller

    Abstract: Programmable unitary photonic networks that interfere hundreds of modes are emerging as a key technology in energy-efficient sensing, machine learning, cryptography, and linear optical quantum computing applications. In this work, we establish a theoretical framework to quantify error tolerance and scalability in a more general class of "binary tree cascade'' programmable photonic networks that ac… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

    Comments: 32 pages, 12 figures

  25. Parameter Identification of a PN-Guided Incoming Missile Using an Improved Multiple-Model Mechanism

    Authors: Yinhan Wang, Jiang Wang, Shipeng Fan

    Abstract: An active defense against an incoming missile requires information of it, including a guidance law parameter and a first-order lateral time constant. To this end, assuming that a missile with a proportional navigation (PN) guidance law attempts to attack an aerial target with bang-bang evasive maneuvers, a parameter identification model based on the gated recurrent unit (GRU) neural network is bui… ▽ More

    Submitted 25 January, 2022; originally announced February 2022.

    Comments: 9 pages, 10 figures

  26. arXiv:2011.07210  [pdf, other

    cs.IT eess.SP

    Rate Splitting Multiple Access for Joint Communication and Sensing Systems with Unmanned Aerial Vehicles

    Authors: Yuwei Li, Wanli Ni, Hui Tian, Meihui Hua, Shaoshuai Fan

    Abstract: This paper investigates the problem of resource allocation for joint communication and radar sensing system on rate-splitting multiple access (RSMA) based unmanned aerial vehicle (UAV) system. UAV simultaneously communicates with multiple users and probes signals to targets of interest to exploit cooperative sensing ability and achieve substantial gains in size, cost and power consumption. By virt… ▽ More

    Submitted 12 July, 2021; v1 submitted 13 November, 2020; originally announced November 2020.

  27. Data Age Aware Scheduling for Wireless Powered Mobile-Edge Computing in Industrial Internet of Things

    Authors: Hao Wu, Hui Tian, Shaoshuai Fan, Jiazhi Ren

    Abstract: Wireless powered mobile edge computing has been envisioned as a promising paradigm to enhance the computation capability of low-power wireless devices in Industrial Internet of Things. An efficient resource scheduling method is critical yet challenging to design in such a scenario due to stochastic traffic arrival, time-coupling uplink/downlink decision and incomplete system state knowledge. To ta… ▽ More

    Submitted 26 April, 2020; v1 submitted 11 April, 2020; originally announced April 2020.

    Comments: 21 pages, 4 figures, submitted to IEEE Transactions on Industrial Informatics

  28. arXiv:1903.04579  [pdf, other

    eess.SP cs.NE physics.optics

    Reprogrammable Electro-Optic Nonlinear Activation Functions for Optical Neural Networks

    Authors: Ian A. D. Williamson, Tyler W. Hughes, Momchil Minkov, Ben Bartlett, Sunil Pai, Shanhui Fan

    Abstract: We introduce an electro-optic hardware platform for nonlinear activation functions in optical neural networks. The optical-to-optical nonlinearity operates by converting a small portion of the input optical signal into an analog electric signal, which is used to intensity-modulate the original optical signal with no reduction in processing speed. Our scheme allows for complete nonlinear on-off con… ▽ More

    Submitted 22 July, 2019; v1 submitted 12 March, 2019; originally announced March 2019.

    Comments: 12 pages, 6 figures

    Journal ref: IEEE Journal of Selected Topics in Quantum Electronics, vol. 26, no. 1, pp. 1-12, Jan. 2020

  29. arXiv:1901.08118  [pdf

    eess.IV physics.optics

    Imaging-free object recognition enabled by optical coherence

    Authors: Yixuan Tan, Xin Lei, Xingze Wang, Shanhui Fan, Zongfu Yu

    Abstract: Visual object recognition is one of the most important perception functions for a wide range of intelligent machines. A conventional recognition process begins with forming a clear optical image of the object, followed by its computer analysis. In contrast, it is possible to carry out recognition without imaging by using coherent illumination and directly analyzing the optical interference pattern… ▽ More

    Submitted 23 January, 2019; originally announced January 2019.

  30. Single wavelength 480 Gb/s direct detection over 80km SSMF enabled by Stokes Vector Kramers Kronig transceiver

    Authors: Thang Hoang, Mohammed Sowailem, Qunbi Zhuge, Zhenping Xing, Mohamed Morsy-Osman, Eslam El-Fiky, Sujie Fan, Meng Xiang, David V. Plant

    Abstract: We propose 4D modulation with directed detection employing a novel Stokes-Vector Kramers-Kronig transceiver. It shows that employing Stokes vector receiver, transmitted digital carrier and Kramers-Kronig detection offers an effective way to de-rotate polarization multiplexed complex double side band signal without using a local oscillator at receiver. The impact of system parameters and configurat… ▽ More

    Submitted 27 October, 2017; originally announced October 2017.