Skip to main content

Showing 1–50 of 88 results for author: Yuan, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.13106  [pdf, ps, other

    cs.RO eess.SP

    Autonomous 3D Moving Target Encirclement and Interception with Range measurement

    Authors: Fen Liu, Shenghai Yuan, Thien-Minh Nguyen, Rong Su

    Abstract: Commercial UAVs are an emerging security threat as they are capable of carrying hazardous payloads or disrupting air traffic. To counter UAVs, we introduce an autonomous 3D target encirclement and interception strategy. Unlike traditional ground-guided systems, this strategy employs autonomous drones to track and engage non-cooperative hostile UAVs, which is effective in non-line-of-sight conditio… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: Paper has been accepted into IROS 2025

  2. arXiv:2506.10574  [pdf, ps, other

    cs.CV cs.MM cs.SD eess.AS

    DanceChat: Large Language Model-Guided Music-to-Dance Generation

    Authors: Qing Wang, Xiaohang Yang, Yilan Dong, Naveen Raj Govindaraj, Gregory Slabaugh, Shanxin Yuan

    Abstract: Music-to-dance generation aims to synthesize human dance motion conditioned on musical input. Despite recent progress, significant challenges remain due to the semantic gap between music and dance motion, as music offers only abstract cues, such as melody, groove, and emotion, without explicitly specifying the physical movements. Moreover, a single piece of music can produce multiple plausible dan… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: check demos at https://dancechat.github.io/anon/

  3. arXiv:2506.08967  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model

    Authors: Ailin Huang, Bingxin Li, Bruce Wang, Boyong Wu, Chao Yan, Chengli Feng, Heng Wang, Hongyu Zhou, Hongyuan Wang, Jingbei Li, Jianjian Sun, Joanna Wang, Mingrui Chen, Peng Liu, Ruihang Miao, Shilei Jiang, Tian Fei, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Ge, Zheng Gong, Zhewei Huang , et al. (51 additional authors not shown)

    Abstract: Large Audio-Language Models (LALMs) have significantly advanced intelligent human-computer interaction, yet their reliance on text-based outputs limits their ability to generate natural speech responses directly, hindering seamless audio interactions. To address this, we introduce Step-Audio-AQAA, a fully end-to-end LALM designed for Audio Query-Audio Answer (AQAA) tasks. The model integrates a du… ▽ More

    Submitted 13 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

    Comments: 12 pages, 3 figures

  4. arXiv:2506.02574  [pdf, other

    eess.IV cs.CV cs.MM

    Dynamic mapping from static labels: remote sensing dynamic sample generation with temporal-spectral embedding

    Authors: Shuai Yuan, Shuang Chen, Tianwu Lin, Jie Wang, Peng Gong

    Abstract: Accurate remote sensing geographic mapping depends heavily on representative and timely sample data. However, rapid changes in land surface dynamics necessitate frequent updates, quickly rendering previously collected samples obsolete and imposing significant labor demands for continuous manual updates. In this study, we aim to address this problem by dynamic sample generation using existing singl… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  5. arXiv:2505.12503  [pdf, ps, other

    eess.SY

    Optimal Task and Motion Planning for Autonomous Systems Using Petri Nets

    Authors: Zhou He, Shilong Yuan, Ning Ran, Dimitri Lefebvre

    Abstract: This study deals with the problem of task and motion planning of autonomous systems within the context of high-level tasks. Specifically, a task comprises logical requirements (conjunctions, disjunctions, and negations) on the trajectories and final states of agents in certain regions of interest. We propose an optimal planning approach that combines offline computation and online planning. First,… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

  6. arXiv:2505.06749  [pdf, ps, other

    eess.SY

    AI-CDA4All: Democratizing Cooperative Autonomous Driving for All Drivers via Affordable Dash-cam Hardware and Open-source AI Software

    Authors: Shengming Yuan, Hao Zhou

    Abstract: As transportation technology advances, the demand for connected vehicle infrastructure has greatly increased to improve their efficiency and safety. One area of advancement, Cooperative Driving Automation (CDA) still relies on expensive autonomy sensors or connectivity units and are not interoperable across existing market car makes/models, limiting its scalability on public roads. To fill these g… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: 8 pages, 10 figures

  7. arXiv:2505.05114  [pdf, other

    eess.AS cs.SD

    Listen to Extract: Onset-Prompted Target Speaker Extraction

    Authors: Pengjie Shen, Kangrui Chen, Shulin He, Pengru Chen, Shuqi Yuan, He Kong, Xueliang Zhang, Zhong-Qiu Wang

    Abstract: We propose $\textit{listen to extract}$ (LExt), a highly-effective while extremely-simple algorithm for monaural target speaker extraction (TSE). Given an enrollment utterance of a target speaker, LExt aims at extracting the target speaker from the speaker's mixed speech with other speakers. For each mixture, LExt concatenates an enrollment utterance of the target speaker to the mixture signal at… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: in submission

  8. arXiv:2504.18692  [pdf, ps, other

    cs.RO eess.SY

    Learning-Based Modeling of Soft Actuators Using Euler Spiral-Inspired Curvature

    Authors: Yu Mei, Shangyuan Yuan, Xinda Qi, Preston Fairchild, Xiaobo Tan

    Abstract: Soft robots, distinguished by their inherent compliance and continuum structures, present unique modeling challenges, especially when subjected to significant external loads such as gravity and payloads. In this study, we introduce an innovative data-driven modeling framework leveraging an Euler spiral-inspired shape representations to accurately describe the complex shapes of soft continuum actua… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  9. arXiv:2504.10842  [pdf, other

    cs.CV eess.IV

    A comprehensive review of remote sensing in wetland classification and mapping

    Authors: Shuai Yuan, Xiangan Liang, Tianwu Lin, Shuang Chen, Rui Liu, Jie Wang, Hongsheng Zhang, Peng Gong

    Abstract: Wetlands constitute critical ecosystems that support both biodiversity and human well-being; however, they have experienced a significant decline since the 20th century. Back in the 1970s, researchers began to employ remote sensing technologies for wetland classification and mapping to elucidate the extent and variations of wetlands. Although some review articles summarized the development of this… ▽ More

    Submitted 21 April, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

  10. arXiv:2504.04969  [pdf, other

    eess.SP

    Grouped Target Tracking and Seamless People Counting with a 24 GHz MIMO FMCW

    Authors: Dingyang Wang, Sen Yuan, Alexander Yarovoy, Francesco Fioranelli

    Abstract: The problem of radar-based tracking of groups of people moving together and counting their numbers in indoor environments is considered here. A novel processing pipeline to track groups of people moving together and count their numbers is proposed and validated. The pipeline is specifically designed to deal with frequent changes of direction and stop & go movements typical of indoor activities. Th… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  11. Cache-Aware Cooperative Multicast Beamforming in Dynamic Satellite-Terrestrial Networks

    Authors: Shuo Yuan, Yaohua Sun, Mugen Peng

    Abstract: With the burgeoning demand for data-intensive services, satellite-terrestrial networks (STNs) face increasing backhaul link congestion, deteriorating user quality of service (QoS), and escalating power consumption. Cache-aided STNs are acknowledged as a promising paradigm for accelerating content delivery to users and alleviating the load of backhaul links. However, the dynamic nature of low earth… ▽ More

    Submitted 22 March, 2025; originally announced March 2025.

    Comments: Accepted by IEEE Transactions on Vehicular Technology

  12. arXiv:2503.17398   

    eess.SY cs.RO

    Reachable Sets-based Trajectory Planning Combining Reinforcement Learning and iLQR

    Authors: Wenjie Huang, Yang Li, Shijie Yuan, Jingjia Teng, Hongmao Qin, Yougang Bian

    Abstract: The driving risk field is applicable to more complex driving scenarios, providing new approaches for safety decision-making and active vehicle control in intricate environments. However, existing research often overlooks the driving risk field and fails to consider the impact of risk distribution within drivable areas on trajectory planning, which poses challenges for enhancing safety. This paper… ▽ More

    Submitted 20 May, 2025; v1 submitted 19 March, 2025; originally announced March 2025.

    Comments: We sincerely request the withdrawal of this paper. After further research and review, we have found that certain parts of the content contain uncertainties and are not sufficient to support the conclusions previously drawn. To avoid any potential misunderstanding or misguidance to the research community, we have decided to voluntarily withdraw the manuscript

  13. arXiv:2503.06311  [pdf, other

    eess.SP

    Hybrid CNN-Dilated Self-attention Model Using Inertial and Body-Area Electrostatic Sensing for Gym Workout Recognition, Counting, and User Authentification

    Authors: Sizhen Bian, Vitor Fortes Rey, Siyu Yuan, Paul Lukowicz

    Abstract: While human body capacitance ($HBC$) has been explored as a novel wearable motion sensing modality, its competence has never been quantitatively demonstrated compared to that of the dominant inertial measurement unit ($IMU$) in practical scenarios. This work is thus motivated to evaluate the contribution of $HBC$ in wearable motion sensing. A real-life case study, gym workout tracking, is describe… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

  14. arXiv:2502.11946  [pdf, other

    cs.CL cs.AI cs.HC cs.SD eess.AS

    Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

    Authors: Ailin Huang, Boyong Wu, Bruce Wang, Chao Yan, Chen Hu, Chengli Feng, Fei Tian, Feiyu Shen, Jingbei Li, Mingrui Chen, Peng Liu, Ruihang Miao, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Gong, Zixin Zhang, Hongyu Zhou, Jianjian Sun, Brian Li, Chengting Feng, Changyi Wan, Hanpeng Hu , et al. (120 additional authors not shown)

    Abstract: Real-time speech interaction, serving as a fundamental interface for human-machine collaboration, holds immense potential. However, current open-source models face limitations such as high costs in voice data collection, weakness in dynamic control, and limited intelligence. To address these challenges, this paper introduces Step-Audio, the first production-ready open-source solution. Key contribu… ▽ More

    Submitted 18 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  15. arXiv:2502.03850  [pdf, other

    cs.IT eess.SP

    Electromagnetic Channel Modeling and Capacity Analysis for HMIMO Communications

    Authors: Li Wei, Shuai S. A. Yuan, Chongwen Huang, Jianhua Zhang, Faouzi Bader, Zhaoyang Zhang, Sami Muhaidat, Merouane Debbah, Chau Yuen

    Abstract: Advancements in emerging technologies, e.g., reconfigurable intelligent surfaces and holographic MIMO (HMIMO), facilitate unprecedented manipulation of electromagnetic (EM) waves, significantly enhancing the performance of wireless communication systems. To accurately characterize the achievable performance limits of these systems, it is crucial to develop a universal EM-compliant channel model. T… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

  16. arXiv:2501.06940  [pdf, other

    eess.SY

    Collaborative Human Activity Recognition with Passive Inter-Body Electrostatic Field

    Authors: Sizhen Bian, Vitor Fortes Rey, Siyu Yuan, Paul Lukowicz

    Abstract: The passive body-area electrostatic field has recently been aspiringly explored for wearable motion sensing, harnessing its two thrilling characteristics: full-body motion sensitivity and environmental sensitivity, which potentially empowers human activity recognition both independently and jointly from a single sensing front-end and theoretically brings significant competition against traditional… ▽ More

    Submitted 12 January, 2025; originally announced January 2025.

  17. arXiv:2501.06566  [pdf, other

    cs.RO eess.SY

    Cooperative Aerial Robot Inspection Challenge: A Benchmark for Heterogeneous Multi-UAV Planning and Lessons Learned

    Authors: Muqing Cao, Thien-Minh Nguyen, Shenghai Yuan, Andreas Anastasiou, Angelos Zacharia, Savvas Papaioannou, Panayiotis Kolios, Christos G. Panayiotou, Marios M. Polycarpou, Xinhang Xu, Mingjie Zhang, Fei Gao, Boyu Zhou, Ben M. Chen, Lihua Xie

    Abstract: We propose the Cooperative Aerial Robot Inspection Challenge (CARIC), a simulation-based benchmark for motion planning algorithms in heterogeneous multi-UAV systems. CARIC features UAV teams with complementary sensors, realistic constraints, and evaluation metrics prioritizing inspection quality and efficiency. It offers a ready-to-use perception-control software stack and diverse scenarios to sup… ▽ More

    Submitted 14 January, 2025; v1 submitted 11 January, 2025; originally announced January 2025.

    Comments: Please find our website at https://ntu-aris.github.io/caric

  18. arXiv:2412.16928  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    AV-DTEC: Self-Supervised Audio-Visual Fusion for Drone Trajectory Estimation and Classification

    Authors: Zhenyuan Xiao, Yizhuo Yang, Guili Xu, Xianglong Zeng, Shenghai Yuan

    Abstract: The increasing use of compact UAVs has created significant threats to public safety, while traditional drone detection systems are often bulky and costly. To address these challenges, we propose AV-DTEC, a lightweight self-supervised audio-visual fusion-based anti-UAV system. AV-DTEC is trained using self-supervised learning with labels generated by LiDAR, and it simultaneously learns audio and vi… ▽ More

    Submitted 22 December, 2024; originally announced December 2024.

    Comments: Submitted to ICRA 2025

  19. arXiv:2412.12698  [pdf, other

    cs.RO cs.SD eess.AS

    Audio Array-Based 3D UAV Trajectory Estimation with LiDAR Pseudo-Labeling

    Authors: Allen Lei, Tianchen Deng, Han Wang, Jianfei Yang, Shenghai Yuan

    Abstract: As small unmanned aerial vehicles (UAVs) become increasingly prevalent, there is growing concern regarding their impact on public safety and privacy, highlighting the need for advanced tracking and trajectory estimation solutions. In response, this paper introduces a novel framework that utilizes audio array for 3D UAV trajectory estimation. Our approach incorporates a self-supervised learning mod… ▽ More

    Submitted 19 January, 2025; v1 submitted 17 December, 2024; originally announced December 2024.

    Comments: Accepted for ICASSP

  20. arXiv:2411.16909  [pdf, other

    eess.SY

    Graph-based Simulation Framework for Power Resilience Estimation and Enhancement

    Authors: Xuesong Wang, Shuo Yuan, Sharaf K. Magableh, Oraib Dawaghreh, Caisheng Wang, Le Yi Wang

    Abstract: The increasing frequency of extreme weather events poses significant risks to power distribution systems, leading to widespread outages and severe economic and social consequences. This paper presents a novel simulation framework for assessing and enhancing the resilience of power distribution networks under such conditions. Resilience is estimated through Monte Carlo simulations, which simulate e… ▽ More

    Submitted 11 February, 2025; v1 submitted 25 November, 2024; originally announced November 2024.

    Comments: Accepted to 2025 IEEE PES General Meeting

  21. Electromagnetic Modeling and Capacity Analysis of Rydberg Atom-Based MIMO System

    Authors: Shuai S. A. Yuan, Xinyi Y. I. Xu, Jinpeng Yuan, Guoda Xie, Chongwen Huang, Xiaoming Chen, Zhixiang Huang, Wei E. I. Sha

    Abstract: Rydberg atom-based antennas exploit the quantum properties of highly excited Rydberg atoms, providing unique advantages over classical antennas, such as high sensitivity, broad frequency range, and compact size. Despite the increasing interests in their applications in antenna and communication engineering, two key properties, involving the lack of polarization multiplexing and isotropic reception… ▽ More

    Submitted 13 November, 2024; originally announced November 2024.

    Comments: in IEEE Antennas and Wireless Propagation Letters, 2025

  22. arXiv:2411.05361  [pdf, ps, other

    cs.CL eess.AS

    Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks

    Authors: Chien-yu Huang, Wei-Chih Chen, Shu-wen Yang, Andy T. Liu, Chen-An Li, Yu-Xiang Lin, Wei-Cheng Tseng, Anuj Diwan, Yi-Jen Shih, Jiatong Shi, William Chen, Chih-Kai Yang, Wenze Ren, Xuanjun Chen, Chi-Yuan Hsiao, Puyuan Peng, Shih-Heng Wang, Chun-Yi Kuan, Ke-Han Lu, Kai-Wei Chang, Fabian Ritter-Gutierrez, Kuan-Po Huang, Siddhant Arora, You-Kuan Lin, Ming To Chuang , et al. (55 additional authors not shown)

    Abstract: Multimodal foundation models, such as Gemini and ChatGPT, have revolutionized human-machine interactions by seamlessly integrating various forms of data. Developing a universal spoken language model that comprehends a wide range of natural language instructions is critical for bridging communication gaps and facilitating more intuitive interactions. However, the absence of a comprehensive evaluati… ▽ More

    Submitted 9 June, 2025; v1 submitted 8 November, 2024; originally announced November 2024.

    Comments: ICLR 2025

  23. arXiv:2410.21233  [pdf, other

    cs.SD eess.AS

    ST-ITO: Controlling Audio Effects for Style Transfer with Inference-Time Optimization

    Authors: Christian J. Steinmetz, Shubhr Singh, Marco Comunità, Ilias Ibnyahya, Shanxin Yuan, Emmanouil Benetos, Joshua D. Reiss

    Abstract: Audio production style transfer is the task of processing an input to impart stylistic elements from a reference recording. Existing approaches often train a neural network to estimate control parameters for a set of audio effects. However, these approaches are limited in that they can only control a fixed set of effects, where the effects must be differentiable or otherwise employ specialized tra… ▽ More

    Submitted 28 October, 2024; originally announced October 2024.

    Comments: Accepted to ISMIR 2024. Code available https://github.com/csteinmetz1/st-ito

  24. arXiv:2410.15869  [pdf, other

    cs.RO eess.SY

    Robust Loop Closure by Textual Cues in Challenging Environments

    Authors: Tongxing Jin, Thien-Minh Nguyen, Xinhang Xu, Yizhuo Yang, Shenghai Yuan, Jianping Li, Lihua Xie

    Abstract: Loop closure is an important task in robot navigation. However, existing methods mostly rely on some implicit or heuristic features of the environment, which can still fail to work in common environments such as corridors, tunnels, and warehouses. Indeed, navigating in such featureless, degenerative, and repetitive (FDR) environments would also pose a significant challenge even for humans, but exp… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

  25. arXiv:2410.11511  [pdf, other

    eess.IV cs.CV

    Rician Denoising Diffusion Probabilistic Models For Sodium Breast MRI Enhancement

    Authors: Shuaiyu Yuan, Tristan Whitmarsh, Dimitri A Kessler, Otso Arponen, Mary A McLean, Gabrielle Baxter, Frank Riemer, Aneurin J Kennerley, William J Brackenbury, Fiona J Gilbert, Joshua D Kaggie

    Abstract: Sodium MRI is an imaging technique used to visualize and quantify sodium concentrations in vivo, playing a role in many biological processes and potentially aiding in breast cancer characterization. Sodium MRI, however, suffers from inherently low signal-to-noise ratios (SNR) and spatial resolution, compared with conventional proton MRI. A deep-learning method, the Denoising Diffusion Probabilisti… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: 3 figures

    ACM Class: I.4.3

  26. Electromagnetic Normalization of Channel Matrix for Holographic MIMO Communications

    Authors: Shuai S. A. Yuan, Li Wei, Xiaoming Chen, Chongwen Huang, Wei E. I. Sha

    Abstract: Holographic multiple-input and multiple-output (MIMO) communications introduce innovative antenna array configurations, such as dense arrays and volumetric arrays, which offer notable advantages over conventional planar arrays with half-wavelength element spacing. However, accurately assessing the performance of these new holographic MIMO systems necessitates careful consideration of channel matri… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

    Comments: in IEEE Transactions on Wireless Communications, 2025

  27. arXiv:2409.01199  [pdf, other

    cs.CV eess.IV

    OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model

    Authors: Liuhan Chen, Zongjian Li, Bin Lin, Bin Zhu, Qian Wang, Shenghai Yuan, Xing Zhou, Xinhua Cheng, Li Yuan

    Abstract: Variational Autoencoder (VAE), compressing videos into latent representations, is a crucial preceding component of Latent Video Diffusion Models (LVDMs). With the same reconstruction quality, the more sufficient the VAE's compression for videos is, the more efficient the LVDMs are. However, most LVDMs utilize 2D image VAE, whose compression for videos is only in the spatial dimension and often ign… ▽ More

    Submitted 9 September, 2024; v1 submitted 2 September, 2024; originally announced September 2024.

    Comments: https://github.com/PKU-YuanGroup/Open-Sora-Plan

  28. arXiv:2408.16481  [pdf, other

    eess.IV cs.CV

    A Deep-Learning-Based Label-free No-Reference Image Quality Assessment Metric: Application in Sodium MRI Denoising

    Authors: Shuaiyu Yuan, Tristan Whitmarsh, Dimitri A Kessler, Otso Arponen, Mary A McLean, Gabrielle Baxter, Frank Riemer, Aneurin J Kennerley, William J Brackenbury, Fiona J Gilbert, Joshua D Kaggie

    Abstract: New multinuclear MRI techniques, such as sodium MRI, generally suffer from low image quality due to an inherently low signal. Postprocessing methods, such as image denoising, have been developed for image enhancement. However, the assessment of these enhanced images is challenging especially considering when there is a lack of high resolution and high signal images as reference, such as in sodium… ▽ More

    Submitted 2 September, 2024; v1 submitted 29 August, 2024; originally announced August 2024.

    Comments: 13 pages, 3 figures

  29. arXiv:2408.13800  [pdf, other

    eess.IV cs.CV

    BCDNet: A Fast Residual Neural Network For Invasive Ductal Carcinoma Detection

    Authors: Yujia Lin, Aiwei Lian, Mingyu Liao, Shuangjie Yuan

    Abstract: It is of great significance to diagnose Invasive Ductal Carcinoma (IDC) in early stage, which is the most common subtype of breast cancer. Although the powerful models in the Computer-Aided Diagnosis (CAD) systems provide promising results, it is still difficult to integrate them into other medical devices or use them without sufficient computation resource. In this paper, we propose BCDNet, which… ▽ More

    Submitted 6 November, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

    Comments: 5 pages, 3 figures

  30. arXiv:2406.13705  [pdf, other

    eess.IV cs.AI cs.CV

    EndoUIC: Promptable Diffusion Transformer for Unified Illumination Correction in Capsule Endoscopy

    Authors: Long Bai, Tong Chen, Qiaozhi Tan, Wan Jun Nah, Yanheng Li, Zhicheng He, Sishen Yuan, Zhen Chen, Jinlin Wu, Mobarakol Islam, Zhen Li, Hongbin Liu, Hongliang Ren

    Abstract: Wireless Capsule Endoscopy (WCE) is highly valued for its non-invasive and painless approach, though its effectiveness is compromised by uneven illumination from hardware constraints and complex internal dynamics, leading to overexposed or underexposed images. While researchers have discussed the challenges of low-light enhancement in WCE, the issue of correcting for different exposure levels rema… ▽ More

    Submitted 8 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: To appear in MICCAI 2024. Code and dataset availability: https://github.com/longbai1006/EndoUIC

  31. arXiv:2405.16516  [pdf, other

    eess.IV cs.CV

    Memory-efficient High-resolution OCT Volume Synthesis with Cascaded Amortized Latent Diffusion Models

    Authors: Kun Huang, Xiao Ma, Yuhan Zhang, Na Su, Songtao Yuan, Yong Liu, Qiang Chen, Huazhu Fu

    Abstract: Optical coherence tomography (OCT) image analysis plays an important role in the field of ophthalmology. Current successful analysis models rely on available large datasets, which can be challenging to be obtained for certain tasks. The use of deep generative models to create realistic data emerges as a promising approach. However, due to limitations in hardware resources, it is still difficulty t… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: Provisionally accepted for medical image computing and computer-assisted intervention (MICCAI) 2024

  32. arXiv:2405.07218  [pdf, other

    physics.med-ph eess.SY

    Chained Flexible Capsule Endoscope: Unraveling the Conundrum of Size Limitations and Functional Integration for Gastrointestinal Transitivity

    Authors: Sishen Yuan, Guang Li, Baijia Liang, Lailu Li, Qingzhuo Zheng, Shuang Song, Zhen Li, Hongliang Ren

    Abstract: Capsule endoscopes, predominantly serving diagnostic functions, provide lucid internal imagery but are devoid of surgical or therapeutic capabilities. Consequently, despite lesion detection, physicians frequently resort to traditional endoscopic or open surgical procedures for treatment, resulting in more complex, potentially risky interventions. To surmount these limitations, this study introduce… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  33. arXiv:2405.07216  [pdf, other

    eess.SY

    Magnetic-Guided Flexible Origami Robot toward Long-Term Phototherapy of H. pylori in the Stomach

    Authors: Sishen Yuan, Baijia Liang, Po Wa Wong, Mingjing Xu, Chi Hsuan Li, Zhen Li, Hongliang Ren

    Abstract: Helicobacter pylori, a pervasive bacterial infection associated with gastrointestinal disorders such as gastritis, peptic ulcer disease, and gastric cancer, impacts approximately 50% of the global population. The efficacy of standard clinical eradication therapies is diminishing due to the rise of antibiotic-resistant strains, necessitating alternative treatment strategies. Photodynamic therapy (P… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: IEEE ICRA 2024

  34. arXiv:2403.17460  [pdf, other

    eess.IV cs.CV

    Building Bridges across Spatial and Temporal Resolutions: Reference-Based Super-Resolution via Change Priors and Conditional Diffusion Model

    Authors: Runmin Dong, Shuai Yuan, Bin Luo, Mengxuan Chen, Jinxiao Zhang, Lixian Zhang, Weijia Li, Juepeng Zheng, Haohuan Fu

    Abstract: Reference-based super-resolution (RefSR) has the potential to build bridges across spatial and temporal resolutions of remote sensing images. However, existing RefSR methods are limited by the faithfulness of content reconstruction and the effectiveness of texture transfer in large scaling factors. Conditional diffusion models have opened up new opportunities for generating realistic high-resoluti… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024

  35. arXiv:2402.18527  [pdf, other

    cs.CV cs.LG eess.IV

    Defect Detection in Tire X-Ray Images: Conventional Methods Meet Deep Structures

    Authors: Andrei Cozma, Landon Harris, Hairong Qi, Ping Ji, Wenpeng Guo, Song Yuan

    Abstract: This paper introduces a robust approach for automated defect detection in tire X-ray images by harnessing traditional feature extraction methods such as Local Binary Pattern (LBP) and Gray Level Co-Occurrence Matrix (GLCM) features, as well as Fourier and Wavelet-based features, complemented by advanced machine learning techniques. Recognizing the challenges inherent in the complex patterns and te… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 7 pages, 2 figures, 3 tables, submitted to ICIP2024

    ACM Class: I.4.7; I.4.9; I.4.0

  36. DeepLight: Reconstructing High-Resolution Observations of Nighttime Light With Multi-Modal Remote Sensing Data

    Authors: Lixian Zhang, Runmin Dong, Shuai Yuan, Jinxiao Zhang, Mengxuan Chen, Juepeng Zheng, Haohuan Fu

    Abstract: Nighttime light (NTL) remote sensing observation serves as a unique proxy for quantitatively assessing progress toward meeting a series of Sustainable Development Goals (SDGs), such as poverty estimation, urban sustainable development, and carbon emission. However, existing NTL observations often suffer from pervasive degradation and inconsistency, limiting their utility for computing the indicato… ▽ More

    Submitted 23 May, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: This paper has been accepted in IJCAI 2024

  37. 3D high-resolution imaging algorithm using 1D MIMO array for autonomous driving application

    Authors: Sen Yuan, Francesco Fioranelli, Alexander Yarovoy

    Abstract: The problem of 3D high-resolution imaging in automotive multiple-input multiple-output (MIMO) side-looking radar using a 1D array is considered. The concept of motion-enhanced snapshots is introduced for generating larger apertures in the azimuth dimension. For the first time, 3D imaging capabilities can be achieved with high angular resolution using a 1D MIMO antenna array, which can alleviate th… ▽ More

    Submitted 28 November, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Submitted to IEEE Transactions. For citation, please use the IEEE doi: 10.1109/TRS.2024.3493992

    Journal ref: in IEEE Transactions on Radar Systems, vol. 2, pp. 1186-1199, 2024

  38. arXiv:2402.09101  [pdf, other

    eess.IV cs.CV

    DestripeCycleGAN: Stripe Simulation CycleGAN for Unsupervised Infrared Image Destriping

    Authors: Shiqi Yang, Hanlin Qin, Shuai Yuan, Xiang Yan, Hossein Rahmani

    Abstract: CycleGAN has been proven to be an advanced approach for unsupervised image restoration. This framework consists of two generators: a denoising one for inference and an auxiliary one for modeling noise to fulfill cycle-consistency constraints. However, when applied to the infrared destriping task, it becomes challenging for the vanilla auxiliary generator to consistently produce vertical noise unde… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  39. Contingency Detection in Modern Power Systems: A Stochastic Hybrid System Method

    Authors: Shuo Yuan, Le Yi Wang, George Yin, Masoud H. Nazari

    Abstract: This paper introduces a new stochastic hybrid system (SHS) framework for contingency detection in modern power systems (MPS). The framework uses stochastic hybrid system representations in state space models to expand and facilitate capability of contingency detection. In typical microgrids (MGs), buses may contain various synchronous generators, renewable generators, controllable loads, battery s… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: 12 pages, 10 figures. arXiv admin note: text overlap with arXiv:2401.16568

  40. arXiv:2401.16568  [pdf, ps, other

    eess.SY

    Stochastic Hybrid System Modeling and State Estimation of Modern Power Systems under Contingency

    Authors: Shuo Yuan, Le Yi Wang, George Yin, Masoud H. Nazari

    Abstract: This paper introduces a stochastic hybrid system (SHS) framework in state space model to capture sensor, communication, and system contingencies in modern power systems (MPS). Within this new framework, the paper concentrates on the development of state estimation methods and algorithms to provide reliable state estimation under randomly intermittent and noisy sensor data. MPSs employ diversified… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: 15 pages, 9 figures

  41. Joint Beam Direction Control and Radio Resource Allocation in Dynamic Multi-beam LEO Satellite Networks

    Authors: Shuo Yuan, Yaohua Sun, Mugen Peng, Renzhi Yuan

    Abstract: Multi-beam low earth orbit (LEO) satellites are emerging as key components in beyond 5G and 6G to provide global coverage and high data rate. To fully unleash the potential of LEO satellite communication, resource management plays a key role. However, the uneven distribution of users, the coupling of multi-dimensional resources, complex inter-beam interference, and time-varying network topologies… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: Accepted by IEEE Transactions on Vehicular Technology

  42. arXiv:2311.02927  [pdf

    eess.IV physics.bio-ph

    Auto-ICell: An Accessible and Cost-Effective Integrative Droplet Microfluidic System for Real-Time Single-Cell Morphological and Apoptotic Analysis

    Authors: Yuanyuan Wei, Meiai Lin, Shanhang Luo, Syed Muhammad Tariq Abbasi, Liwei Tan, Guangyao Cheng, Bijie Bai, Yi-Ping Ho, Scott Wu Yuan, Ho-Pui Ho

    Abstract: The Auto-ICell system, a novel, and cost-effective integrated droplet microfluidic system, is introduced for real-time analysis of single-cell morphology and apoptosis. This system integrates a 3D-printed microfluidic chip with image analysis algorithms, enabling the generation of uniform droplet reactors and immediate image analysis. The system employs a color-based image analysis algorithm in th… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: 22 pages, 5 figures

  43. Joint Network Function Placement and Routing Optimization in Dynamic Software-defined Satellite-Terrestrial Integrated Networks

    Authors: Shuo Yuan, Yaohua Sun, Mugen Peng

    Abstract: Software-defined satellite-terrestrial integrated networks (SDSTNs) are seen as a promising paradigm for achieving high resource flexibility and global communication coverage. However, low latency service provisioning is still challenging due to the fast variation of network topology and limited onboard resource at low earth orbit satellites. To address this issue, we study service provisioning in… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

    Comments: Accepted by IEEE Transactions on Wireless Communications

  44. arXiv:2309.01384  [pdf

    q-bio.QM eess.IV eess.SY

    Deep Learning Approach for Large-Scale, Real-Time Quantification of Green Fluorescent Protein-Labeled Biological Samples in Microreactors

    Authors: Yuanyuan Wei, Sai Mu Dalike Abaxi, Nawaz Mehmood, Luoquan Li, Fuyang Qu, Guangyao Cheng, Dehua Hu, Yi-Ping Ho, Scott Wu Yuan, Ho-Pui Ho

    Abstract: Absolute quantification of biological samples entails determining expression levels in precise numerical copies, offering enhanced accuracy and superior performance for rare templates. However, existing methodologies suffer from significant limitations: flow cytometers are both costly and intricate, while fluorescence imaging relying on software tools or manual counting is time-consuming and prone… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

    Comments: 23 pages, 6 figures, 1 table

  45. arXiv:2308.06432  [pdf, other

    eess.IV cs.CV cs.LG

    Learn Single-horizon Disease Evolution for Predictive Generation of Post-therapeutic Neovascular Age-related Macular Degeneration

    Authors: Yuhan Zhang, Kun Huang, Mingchao Li, Songtao Yuan, Qiang Chen

    Abstract: Most of the existing disease prediction methods in the field of medical image processing fall into two classes, namely image-to-category predictions and image-to-parameter predictions. Few works have focused on image-to-image predictions. Different from multi-horizon predictions in other fields, ophthalmologists prefer to show more confidence in single-horizon predictions due to the low tolerance… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

  46. arXiv:2306.15857  [pdf, other

    eess.SP

    GeXSe (Generative Explanatory Sensor System): An Interpretable Deep Generative Model for Human Activity Recognition in Smart Spaces

    Authors: Sun Yuan, Salami Pargoo Navid, Ortiz Jorge

    Abstract: We introduce GeXSe (Generative Explanatory Sensor System), a novel framework designed to extract interpretable sensor-based and vision domain features from non-invasive smart space sensors. We combine these to provide a comprehensive explanation of sensor-activation patterns in activity recognition tasks. This system leverages advanced machine learning architectures, including transformer blocks,… ▽ More

    Submitted 4 February, 2025; v1 submitted 27 June, 2023; originally announced June 2023.

    Comments: 29 pages,17 figures

    ACM Class: I.5.4

  47. arXiv:2306.05704  [pdf, other

    cs.CV cs.MM eess.IV

    Exploring Effective Mask Sampling Modeling for Neural Image Compression

    Authors: Lin Liu, Mingming Zhao, Shanxin Yuan, Wenlong Lyu, Wengang Zhou, Houqiang Li, Yanfeng Wang, Qi Tian

    Abstract: Image compression aims to reduce the information redundancy in images. Most existing neural image compression methods rely on side information from hyperprior or context models to eliminate spatial redundancy, but rarely address the channel redundancy. Inspired by the mask sampling modeling in recent self-supervised learning methods for natural language processing and high-level vision, we propose… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

    Comments: 10 pages

  48. arXiv:2305.10345  [pdf, other

    eess.SP cs.AI cs.CV cs.MM

    MM-Fi: Multi-Modal Non-Intrusive 4D Human Dataset for Versatile Wireless Sensing

    Authors: Jianfei Yang, He Huang, Yunjiao Zhou, Xinyan Chen, Yuecong Xu, Shenghai Yuan, Han Zou, Chris Xiaoxuan Lu, Lihua Xie

    Abstract: 4D human perception plays an essential role in a myriad of applications, such as home automation and metaverse avatar simulation. However, existing solutions which mainly rely on cameras and wearable devices are either privacy intrusive or inconvenient to use. To address these issues, wireless sensing has emerged as a promising alternative, leveraging LiDAR, mmWave radar, and WiFi signals for devi… ▽ More

    Submitted 24 September, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

    Comments: The paper has been accepted by NeurIPS 2023 Datasets and Benchmarks Track. Project page: https://ntu-aiot-lab.github.io/mm-fi

  49. arXiv:2305.09833  [pdf, other

    eess.IV cs.CV

    Segmentation of Aortic Vessel Tree in CT Scans with Deep Fully Convolutional Networks

    Authors: Shaofeng Yuan, Feng Yang

    Abstract: Automatic and accurate segmentation of aortic vessel tree (AVT) in computed tomography (CT) scans is crucial for early detection, diagnosis and prognosis of aortic diseases, such as aneurysms, dissections and stenosis. However, this task remains challenges, due to the complexity of aortic vessel tree and amount of CT angiography data. In this technical report, we use two-stage fully convolutional… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

    Comments: 7 pages, 1 figure, 1 table

  50. TelecomTM: A Fine-Grained and Ubiquitous Traffic Monitoring System Using Pre-Existing Telecommunication Fiber-Optic Cables as Sensors

    Authors: Jingxiao Liu, Siyuan Yuan, Yiwen Dong, Biondo Biondi, Hae Young Noh

    Abstract: We introduce the TelecomTM system that uses pre-existing telecommunication fiber-optic cables as virtual strain sensors to sense vehicle-induced ground vibrations for fine-grained and ubiquitous traffic monitoring and characterization. Here we call it a virtual sensor because it is a software-based representation of a physical sensor. Due to the extensively installed telecommunication fiber-optic… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.