Skip to main content

Showing 1–50 of 268 results for author: Ma, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.05317  [pdf, ps, other

    eess.IV cs.AI cs.CV

    PWD: Prior-Guided and Wavelet-Enhanced Diffusion Model for Limited-Angle CT

    Authors: Yi Liu, Yiyang Wen, Zekun Zhou, Junqi Ma, Linghang Wang, Yucheng Yao, Liu Shi, Qiegen Liu

    Abstract: Generative diffusion models have received increasing attention in medical imaging, particularly in limited-angle computed tomography (LACT). Standard diffusion models achieve high-quality image reconstruction but require a large number of sampling steps during inference, resulting in substantial computational overhead. Although skip-sampling strategies have been proposed to improve efficiency, the… ▽ More

    Submitted 10 July, 2025; v1 submitted 30 June, 2025; originally announced July 2025.

  2. arXiv:2507.04891  [pdf, ps, other

    eess.IV cs.CV

    MurreNet: Modeling Holistic Multimodal Interactions Between Histopathology and Genomic Profiles for Survival Prediction

    Authors: Mingxin Liu, Chengfei Cai, Jun Li, Pengbo Xu, Jinze Li, Jiquan Ma, Jun Xu

    Abstract: Cancer survival prediction requires integrating pathological Whole Slide Images (WSIs) and genomic profiles, a challenging task due to the inherent heterogeneity and the complexity of modeling both inter- and intra-modality interactions. Current methods often employ straightforward fusion strategies for multimodal feature integration, failing to comprehensively capture modality-specific and modali… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: 11 pages, 2 figures, Accepted by MICCAI 2025

  3. arXiv:2506.22710  [pdf, ps, other

    cs.CV eess.IV

    LightBSR: Towards Lightweight Blind Super-Resolution via Discriminative Implicit Degradation Representation Learning

    Authors: Jiang Yuan, JI Ma, Bo Wang, Guanzhou Ke, Weiming Hu

    Abstract: Implicit degradation estimation-based blind super-resolution (IDE-BSR) hinges on extracting the implicit degradation representation (IDR) of the LR image and adapting it to LR image features to guide HR detail restoration. Although IDE-BSR has shown potential in dealing with noise interference and complex degradations, existing methods ignore the importance of IDR discriminability for BSR and inst… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Journal ref: International Conference on Computer Vision (ICCV) 2025

  4. arXiv:2506.22012  [pdf, ps, other

    eess.IV cs.CV

    Noise-Inspired Diffusion Model for Generalizable Low-Dose CT Reconstruction

    Authors: Qi Gao, Zhihao Chen, Dong Zeng, Junping Zhang, Jianhua Ma, Hongming Shan

    Abstract: The generalization of deep learning-based low-dose computed tomography (CT) reconstruction models to doses unseen in the training data is important and remains challenging. Previous efforts heavily rely on paired data to improve the generalization performance and robustness through collecting either diverse CT data for re-training or a few test data for fine-tuning. Recently, diffusion models have… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: Accepted for publication in Medical Image Analysis, 2025

  5. arXiv:2506.21535  [pdf, ps, other

    eess.IV cs.CV cs.LG

    Exploring the Design Space of 3D MLLMs for CT Report Generation

    Authors: Mohammed Baharoon, Jun Ma, Congyu Fang, Augustin Toma, Bo Wang

    Abstract: Multimodal Large Language Models (MLLMs) have emerged as a promising way to automate Radiology Report Generation (RRG). In this work, we systematically investigate the design space of 3D MLLMs, including visual input representation, projectors, Large Language Models (LLMs), and fine-tuning techniques for 3D CT report generation. We also introduce two knowledge-based report augmentation methods tha… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  6. arXiv:2506.18460  [pdf, ps, other

    eess.SY

    Networked pointing system: Bearing-only target localization and pointing control

    Authors: Shiyao Li, Bo Zhu, Yining Zhou, Jie Ma, Baoqing Yang, Fenghua He

    Abstract: In the paper, we formulate the target-pointing consensus problem where the headings of agents are required to point at a common target. Only a few agents in the network can measure the bearing information of the target. A two-step solution consisting of a bearing-only estimator for target localization and a control law for target pointing is constructed to address this problem. Compared to the str… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: IFAC Conference on Networked Systems, 2025

  7. arXiv:2506.10866  [pdf, ps, other

    eess.SY

    Data-Driven Model Reduction by Moment Matching for Linear and Nonlinear Parametric Systems

    Authors: Hanqing Zhang, Junyu Mao, Mohammad Fahim Shakib, Giordano Scarciotti

    Abstract: Theory and methods to obtain parametric reduced-order models by moment matching are presented. The definition of the parametric moment is introduced, and methods (model-based and data-driven) for the approximation of the parametric moment of linear and nonlinear parametric systems are proposed. These approximations are exploited to construct families of parametric reduced-order models that match t… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: 16 pages, 6 figures, submitted to IEEE Transactions on Automatic Control

  8. arXiv:2506.00733  [pdf, ps, other

    eess.AS cs.SD

    Quantifying and Reducing Speaker Heterogeneity within the Common Voice Corpus for Phonetic Analysis

    Authors: Miao Zhang, Aref Farhadipour, Annie Baker, Jiachen Ma, Bogdan Pricop, Eleanor Chodroff

    Abstract: With its crosslinguistic and cross-speaker diversity, the Mozilla Common Voice Corpus (CV) has been a valuable resource for multilingual speech technology and holds tremendous potential for research in crosslinguistic phonetics and speech sciences. Properly accounting for speaker variation is, however, key to the theoretical and statistical bases of speech research. While CV provides a client ID a… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: Accepted for Interspeech 2025

  9. arXiv:2505.24486  [pdf, ps, other

    cs.SD cs.AI cs.CR cs.LG eess.AS

    Rehearsal with Auxiliary-Informed Sampling for Audio Deepfake Detection

    Authors: Falih Gozi Febrinanto, Kristen Moore, Chandra Thapa, Jiangang Ma, Vidya Saikrishna, Feng Xia

    Abstract: The performance of existing audio deepfake detection frameworks degrades when confronted with new deepfake attacks. Rehearsal-based continual learning (CL), which updates models using a limited set of old data samples, helps preserve prior knowledge while incorporating new information. However, existing rehearsal techniques don't effectively capture the diversity of audio characteristics, introduc… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech 2025

  10. arXiv:2505.08556  [pdf

    eess.SY

    A High-Efficiency Reconfigurable Bidirectional Array Antenna Based on Transmit-Reflect Switchable Metasurface

    Authors: Fan Qin, Jinyang Bi, Jiao Ma, Chao Gu, Hailin Zhang, Wenchi Cheng, Steven Gao

    Abstract: This paper proposes a reconfigurable bidirectional array antenna with high-efficiency radiations and flexible beam-switching capability by employing a novel transmit-reflect switchable metasurface (TRSM). To realize the electromagnetic (EM) wave transmitted or reflected manipulation, a dedicated transmit-reflect switch layer (TRSL) with periodically soldered PIN diodes is introduced between two tr… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: 11 pages, 18 figures, published to TAP

  11. arXiv:2505.03380  [pdf, other

    cs.CV cs.AI eess.IV

    Reinforced Correlation Between Vision and Language for Precise Medical AI Assistant

    Authors: Haonan Wang, Jiaji Mao, Lehan Wang, Qixiang Zhang, Marawan Elbatel, Yi Qin, Huijun Hu, Baoxun Li, Wenhui Deng, Weifeng Qin, Hongrui Li, Jialin Liang, Jun Shen, Xiaomeng Li

    Abstract: Medical AI assistants support doctors in disease diagnosis, medical image analysis, and report generation. However, they still face significant challenges in clinical use, including limited accuracy with multimodal content and insufficient validation in real-world settings. We propose RCMed, a full-stack AI assistant that improves multimodal alignment in both input and output, enabling precise ana… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  12. arXiv:2505.01768  [pdf, ps, other

    eess.IV cs.CV

    Continuous Filtered Backprojection by Learnable Interpolation Network

    Authors: Hui Lin, Dong Zeng, Qi Xie, Zerui Mao, Jianhua Ma, Deyu Meng

    Abstract: Accurate reconstruction of computed tomography (CT) images is crucial in medical imaging field. However, there are unavoidable interpolation errors in the backprojection step of the conventional reconstruction methods, i.e., filtered-back-projection based methods, which are detrimental to the accurate reconstruction. In this study, to address this issue, we propose a novel deep learning model, nam… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

    Comments: 14 pages, 10 figures

  13. arXiv:2504.13010  [pdf, other

    eess.SP

    Simultaneous Polysomnography and Cardiotocography Reveal Temporal Correlation Between Maternal Obstructive Sleep Apnea and Fetal Hypoxia

    Authors: Jingyu Wang, Donglin Xie, Jingying Ma, Yunliang Sun, Linyan Zhang, Rui Bai, Zelin Tu, Liyue Xu, Jun Wei, Jingjing Yang, Yanan Liu, Huijie Yi, Bing Zhou, Long Zhao, Xueli Zhang, Mengling Feng, Xiaosong Dong, Guoli Liu, Fang Han, Shenda Hong

    Abstract: Background: Obstructive sleep apnea syndrome (OSAS) during pregnancy is common and can negatively affect fetal outcomes. However, studies on the immediate effects of maternal hypoxia on fetal heart rate (FHR) changes are lacking. Methods: We used time-synchronized polysomnography (PSG) and cardiotocography (CTG) data from two cohorts to analyze the correlation between maternal hypoxia and FHR chan… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  14. arXiv:2504.08240  [pdf, other

    cs.RO eess.SP

    InSPE: Rapid Evaluation of Heterogeneous Multi-Modal Infrastructure Sensor Placement

    Authors: Zhaoliang Zheng, Yun Zhang, Zongling Meng, Johnson Liu, Xin Xia, Jiaqi Ma

    Abstract: Infrastructure sensing is vital for traffic monitoring at safety hotspots (e.g., intersections) and serves as the backbone of cooperative perception in autonomous driving. While vehicle sensing has been extensively studied, infrastructure sensing has received little attention, especially given the unique challenges of diverse intersection geometries, complex occlusions, varying traffic conditions,… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

  15. arXiv:2504.03600  [pdf, other

    eess.IV cs.AI cs.CV

    MedSAM2: Segment Anything in 3D Medical Images and Videos

    Authors: Jun Ma, Zongxin Yang, Sumin Kim, Bihui Chen, Mohammed Baharoon, Adibvafa Fallahpour, Reza Asakereh, Hongwei Lyu, Bo Wang

    Abstract: Medical image and video segmentation is a critical task for precision medicine, which has witnessed considerable progress in developing task or modality-specific and generalist models for 2D images. However, there have been limited studies on building general-purpose models for 3D images and videos with comprehensive user studies. Here, we present MedSAM2, a promptable segmentation foundation mode… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: https://medsam2.github.io/

  16. arXiv:2504.00769  [pdf, other

    math.OC eess.SP

    A Unified Theoretic and Algorithmic Framework for Solving Multivariate Linear Model with $\ell^1$-norm Approximation

    Authors: Zhi-Qiang Feng, Hong-Yan Zhanga, Ji Ma, Daniel Delahaye, Ruo-Shi Yang, Man Liang

    Abstract: It is a challenging problem that solving the \textit{multivariate linear model} (MLM) $\mathbf{A}\mathbf{x}=\mathbf{b}$ with the $\ell_1 $-norm approximation method such that $||\mathbf{A}\mathbf{x}-\mathbf{b}||_1$, the $\ell_1$-norm of the \textit{residual error vector} (REV), is minimized. In this work, our contributions lie in two aspects: firstly, the equivalence theorem for the structure of t… ▽ More

    Submitted 19 May, 2025; v1 submitted 1 April, 2025; originally announced April 2025.

    Comments: 21 pages, 6 figures, 2 tables

  17. arXiv:2503.23893  [pdf, other

    cs.LG cs.AI cs.CV eess.IV

    DiffScale: Continuous Downscaling and Bias Correction of Subseasonal Wind Speed Forecasts using Diffusion Models

    Authors: Maximilian Springenberg, Noelia Otero, Yuxin Xue, Jackie Ma

    Abstract: Renewable resources are strongly dependent on local and large-scale weather situations. Skillful subseasonal to seasonal (S2S) forecasts -- beyond two weeks and up to two months -- can offer significant socioeconomic advantages to the energy sector. This study aims to enhance wind speed predictions using a diffusion model with classifier-free guidance to downscale S2S forecasts of surface wind spe… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

    Comments: 28 pages, 18 figures, preprint under review

  18. arXiv:2503.19505  [pdf, other

    eess.IV cs.CV

    Single-Step Latent Consistency Model for Remote Sensing Image Super-Resolution

    Authors: Xiaohui Sun, Jiangwei Mo, Hanlin Wu, Jie Ma

    Abstract: Recent advancements in diffusion models (DMs) have greatly advanced remote sensing image super-resolution (RSISR). However, their iterative sampling processes often result in slow inference speeds, limiting their application in real-time tasks. To address this challenge, we propose the latent consistency model for super-resolution (LCMSR), a novel single-step diffusion approach designed to enhance… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  19. arXiv:2503.09024  [pdf, other

    cs.RO eess.IV eess.SY

    Traffic Regulation-aware Path Planning with Regulation Databases and Vision-Language Models

    Authors: Xu Han, Zhiwen Wu, Xin Xia, Jiaqi Ma

    Abstract: This paper introduces and tests a framework integrating traffic regulation compliance into automated driving systems (ADS). The framework enables ADS to follow traffic laws and make informed decisions based on the driving environment. Using RGB camera inputs and a vision-language model (VLM), the system generates descriptive text to support a regulation-aware decision-making process, ensuring lega… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: 7 pages, 7 figures, submitted to ICRA

  20. arXiv:2503.04563  [pdf, ps, other

    cs.RO eess.SY

    Occlusion-Aware Consistent Model Predictive Control for Robot Navigation in Occluded Obstacle-Dense Environments

    Authors: Minzhe Zheng, Lei Zheng, Jun Ma

    Abstract: Ensuring safety and motion consistency for robot navigation in occluded, obstacle-dense environments is a critical challenge. In this context, this study presents an occlusion-aware Consistent Model Predictive Control (CMPC) strategy. To account for the occluded obstacles, it incorporates adjustable risk regions that represent their potential future locations. Subsequently, dynamic risk boundary c… ▽ More

    Submitted 7 July, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

  21. arXiv:2503.03774  [pdf, other

    cs.AI cs.GT cs.RO eess.SY

    Fair Play in the Fast Lane: Integrating Sportsmanship into Autonomous Racing Systems

    Authors: Zhenmin Huang, Ce Hao, Wei Zhan, Jun Ma, Masayoshi Tomizuka

    Abstract: Autonomous racing has gained significant attention as a platform for high-speed decision-making and motion control. While existing methods primarily focus on trajectory planning and overtaking strategies, the role of sportsmanship in ensuring fair competition remains largely unexplored. In human racing, rules such as the one-motion rule and the enough-space rule prevent dangerous and unsportsmanli… ▽ More

    Submitted 12 March, 2025; v1 submitted 4 March, 2025; originally announced March 2025.

  22. arXiv:2503.03346  [pdf, other

    eess.SY

    SEAL: Safety Enhanced Trajectory Planning and Control Framework for Quadrotor Flight in Complex Environments

    Authors: Yiming Wang, Jianbin Ma, Junda Wu, Huizhe Li, Zhexuan Zhou, Youmin Gong, Jie Mei, Guangfu Ma

    Abstract: For quadrotors, achieving safe and autonomous flight in complex environments with wind disturbances and dynamic obstacles still faces significant challenges. Most existing methods address wind disturbances in either trajectory planning or control, which may lead to hazardous situations during flight. The emergence of dynamic obstacles would further worsen the situation. Therefore, we propose an ef… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  23. arXiv:2502.18519  [pdf, other

    eess.IV cs.AI cs.CV

    FreeTumor: Large-Scale Generative Tumor Synthesis in Computed Tomography Images for Improving Tumor Recognition

    Authors: Linshan Wu, Jiaxin Zhuang, Yanning Zhou, Sunan He, Jiabo Ma, Luyang Luo, Xi Wang, Xuefeng Ni, Xiaoling Zhong, Mingxiang Wu, Yinghua Zhao, Xiaohui Duan, Varut Vardhanabhuti, Pranav Rajpurkar, Hao Chen

    Abstract: Tumor is a leading cause of death worldwide, with an estimated 10 million deaths attributed to tumor-related diseases every year. AI-driven tumor recognition unlocks new possibilities for more precise and intelligent tumor screening and diagnosis. However, the progress is heavily hampered by the scarcity of annotated datasets, which demands extensive annotation efforts by radiologists. To tackle t… ▽ More

    Submitted 23 February, 2025; originally announced February 2025.

  24. arXiv:2502.09662  [pdf, other

    q-bio.QM cs.CV eess.IV

    Generalizable Cervical Cancer Screening via Large-scale Pretraining and Test-Time Adaptation

    Authors: Hao Jiang, Cheng Jin, Huangjing Lin, Yanning Zhou, Xi Wang, Jiabo Ma, Li Ding, Jun Hou, Runsheng Liu, Zhizhong Chai, Luyang Luo, Huijuan Shi, Yinling Qian, Qiong Wang, Changzhong Li, Anjia Han, Ronald Cheong Kin Chan, Hao Chen

    Abstract: Cervical cancer is a leading malignancy in female reproductive system. While AI-assisted cytology offers a cost-effective and non-invasive screening solution, current systems struggle with generalizability in complex clinical scenarios. To address this issue, we introduced Smart-CCS, a generalizable Cervical Cancer Screening paradigm based on pretraining and adaptation to create robust and general… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

  25. arXiv:2502.05808  [pdf, ps, other

    eess.SP

    Positioning-Aided Channel Estimation for Multi-LEO Satellite Cooperative Communications

    Authors: Yuchen Zhang, Pinjun Zheng, Jie Ma, Henk Wymeersch, Tareq Y. Al-Naffouri

    Abstract: We investigate a multi-low Earth orbit (LEO) satellite system that simultaneously provides positioning and communication services to terrestrial user terminals. To address the challenges of channel estimation in LEO satellite systems, we propose a novel two-timescale positioning-aided channel estimation framework, exploiting the distinct variation rates of position-related parameters and channel g… ▽ More

    Submitted 2 June, 2025; v1 submitted 9 February, 2025; originally announced February 2025.

    Comments: This work has been submitted to IEEE for possible publication

  26. arXiv:2502.05330  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Multi-Class Segmentation of Aortic Branches and Zones in Computed Tomography Angiography: The AortaSeg24 Challenge

    Authors: Muhammad Imran, Jonathan R. Krebs, Vishal Balaji Sivaraman, Teng Zhang, Amarjeet Kumar, Walker R. Ueland, Michael J. Fassler, Jinlong Huang, Xiao Sun, Lisheng Wang, Pengcheng Shi, Maximilian Rokuss, Michael Baumgartner, Yannick Kirchhof, Klaus H. Maier-Hein, Fabian Isensee, Shuolin Liu, Bing Han, Bong Thanh Nguyen, Dong-jin Shin, Park Ji-Woo, Mathew Choi, Kwang-Hyun Uhm, Sung-Jea Ko, Chanwoong Lee , et al. (38 additional authors not shown)

    Abstract: Multi-class segmentation of the aorta in computed tomography angiography (CTA) scans is essential for diagnosing and planning complex endovascular treatments for patients with aortic dissections. However, existing methods reduce aortic segmentation to a binary problem, limiting their ability to measure diameters across different branches and zones. Furthermore, no open-source dataset is currently… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

  27. arXiv:2502.05130  [pdf, other

    cs.SD cs.AI cs.CV cs.MM eess.AS

    Latent Swap Joint Diffusion for 2D Long-Form Latent Generation

    Authors: Yusheng Dai, Chenxi Wang, Chang Li, Chen Wang, Jun Du, Kewei Li, Ruoyu Wang, Jiefeng Ma, Lei Sun, Jianqing Gao

    Abstract: This paper introduces Swap Forward (SaFa), a modality-agnostic and efficient method to generate seamless and coherence long spectrum and panorama through latent swap joint diffusion across multi-views. We first investigate the spectrum aliasing problem in spectrum-based audio generation caused by existing joint diffusion methods. Through a comparative analysis of the VAE latent representation of M… ▽ More

    Submitted 18 March, 2025; v1 submitted 7 February, 2025; originally announced February 2025.

  28. arXiv:2502.02934  [pdf, other

    cs.RO eess.SY

    Gait-Net-augmented Implicit Kino-dynamic MPC for Dynamic Variable-frequency Humanoid Locomotion over Discrete Terrains

    Authors: Junheng Li, Ziwei Duan, Junchao Ma, Quan Nguyen

    Abstract: Reduced-order-model-based optimal control techniques for humanoid locomotion struggle to adapt step duration and placement simultaneously in dynamic walking gaits due to their reliance on fixed-time discretization, which limits responsiveness to various disturbances and results in suboptimal performance in challenging conditions. In this work, we propose a Gait-Net-augmented implicit kino-dynamic… ▽ More

    Submitted 27 May, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

    Comments: 15 pages, 13 figures, RSS 2025 accepted

  29. arXiv:2501.09338  [pdf, other

    cs.RO eess.SY

    Robust UAV Path Planning with Obstacle Avoidance for Emergency Rescue

    Authors: Junteng Mao, Ziye Jia, Hanzhi Gu, Chenyu Shi, Haomin Shi, Lijun He, Qihui Wu

    Abstract: The unmanned aerial vehicles (UAVs) are efficient tools for diverse tasks such as electronic reconnaissance, agricultural operations and disaster relief. In the complex three-dimensional (3D) environments, the path planning with obstacle avoidance for UAVs is a significant issue for security assurance. In this paper, we construct a comprehensive 3D scenario with obstacles and no-fly zones for dyna… ▽ More

    Submitted 16 January, 2025; originally announced January 2025.

  30. arXiv:2501.04942  [pdf, other

    cs.SD eess.AS

    Vision Graph Non-Contrastive Learning for Audio Deepfake Detection with Limited Labels

    Authors: Falih Gozi Febrinanto, Kristen Moore, Chandra Thapa, Jiangang Ma, Vidya Saikrishna, Feng Xia

    Abstract: Recent advancements in audio deepfake detection have leveraged graph neural networks (GNNs) to model frequency and temporal interdependencies in audio data, effectively identifying deepfake artifacts. However, the reliance of GNN-based methods on substantial labeled data for graph construction and robust performance limits their applicability in scenarios with limited labeled data. Although vast a… ▽ More

    Submitted 8 January, 2025; originally announced January 2025.

  31. arXiv:2501.02870  [pdf

    cs.IT eess.SP eess.SY

    Spectrum Sharing in 6G Space-Ground Integrated Networks: A Ground Protection Zone-Based Design

    Authors: Bodong Shang, Xiangyu Li, Zheng Wang, Junchao Ma

    Abstract: Space-ground integrated network (SGIN) has been envisioned as a competitive solution for large scale and wide coverage of future wireless networks. By integrating both the non-terrestrial network (NTN) and the terrestrial network (TN), SGIN can provide high speed and omnipresent wireless network access for the users using the predefined licensed spectrums. Considering the scarcity of the spectrum… ▽ More

    Submitted 6 January, 2025; originally announced January 2025.

  32. arXiv:2501.02815  [pdf, other

    cs.RO eess.SY

    Local Reactive Control for Mobile Manipulators with Whole-Body Safety in Complex Environments

    Authors: Chunxin Zheng, Yulin Li, Zhiyuan Song, Zhihai Bi, Jinni Zhou, Boyu Zhou, Jun Ma

    Abstract: Mobile manipulators typically encounter significant challenges in navigating narrow, cluttered environments due to their high-dimensional state spaces and complex kinematics. While reactive methods excel in dynamic settings, they struggle to efficiently incorporate complex, coupled constraints across the entire state space. In this work, we present a novel local reactive controller that reformulat… ▽ More

    Submitted 6 January, 2025; originally announced January 2025.

  33. arXiv:2501.02530  [pdf, other

    cs.RO cs.DC eess.SY

    UDMC: Unified Decision-Making and Control Framework for Urban Autonomous Driving with Motion Prediction of Traffic Participants

    Authors: Haichao Liu, Kai Chen, Yulin Li, Zhenmin Huang, Ming Liu, Jun Ma

    Abstract: Current autonomous driving systems often struggle to balance decision-making and motion control while ensuring safety and traffic rule compliance, especially in complex urban environments. Existing methods may fall short due to separate handling of these functionalities, leading to inefficiencies and safety compromises. To address these challenges, we introduce UDMC, an interpretable and unified L… ▽ More

    Submitted 5 January, 2025; originally announced January 2025.

  34. arXiv:2501.01456  [pdf, other

    eess.IV cs.CV cs.LG

    SS-CTML: Self-Supervised Cross-Task Mutual Learning for CT Image Reconstruction

    Authors: Gaofeng Chen, Yaoduo Zhang, Li Huang, Pengfei Wang, Wenyu Zhang, Dong Zeng, Jianhua Ma, Ji He

    Abstract: Supervised deep-learning (SDL) techniques with paired training datasets have been widely studied for X-ray computed tomography (CT) image reconstruction. However, due to the difficulties of obtaining paired training datasets in clinical routine, the SDL methods are still away from common uses in clinical practices. In recent years, self-supervised deep-learning (SSDL) techniques have shown great p… ▽ More

    Submitted 30 December, 2024; originally announced January 2025.

  35. arXiv:2412.16085  [pdf, other

    eess.IV cs.CV

    Efficient MedSAMs: Segment Anything in Medical Images on Laptop

    Authors: Jun Ma, Feifei Li, Sumin Kim, Reza Asakereh, Bao-Hiep Le, Dang-Khoa Nguyen-Vu, Alexander Pfefferle, Muxin Wei, Ruochen Gao, Donghang Lyu, Songxiao Yang, Lennart Purucker, Zdravko Marinov, Marius Staring, Haisheng Lu, Thuy Thanh Dao, Xincheng Ye, Zhi Li, Gianluca Brugnara, Philipp Vollmuth, Martha Foltyn-Dumitru, Jaeyoung Cho, Mustafa Ahmed Mahmutoglu, Martin Bendszus, Irada Pflüger , et al. (57 additional authors not shown)

    Abstract: Promptable segmentation foundation models have emerged as a transformative approach to addressing the diverse needs in medical images, but most existing models require expensive computing, posing a big barrier to their adoption in clinical practice. In this work, we organized the first international competition dedicated to promptable medical image segmentation, featuring a large-scale dataset spa… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

    Comments: CVPR 2024 MedSAM on Laptop Competition Summary: https://www.codabench.org/competitions/1847/

  36. arXiv:2412.10088  [pdf, other

    eess.SY

    Model Order Reduction of Large-Scale Wind Farms: A Data-Driven Approach

    Authors: Zilong Gong, Junyu Mao, Adrià Junyent-Ferré, Giordano Scarciotti

    Abstract: This paper proposes a data-driven algorithm for model order reduction (MOR) of large-scale wind farms and studies the effects that the obtained reduced-order model (ROM) has when this is integrated into the power grid. With respect to standard MOR methods, the proposed algorithm has the advantages of having low computational complexity and not requiring any knowledge of the high order model. Using… ▽ More

    Submitted 13 December, 2024; originally announced December 2024.

    Comments: This article has been accepted for publication by IEEE Transactions on Power Systems

  37. arXiv:2412.03514  [pdf, other

    cs.IT eess.SP

    Adaptive Personalized Over-the-Air Federated Learning with Reflecting Intelligent Surfaces

    Authors: Jiayu Mao, Aylin Yener

    Abstract: Over-the-air federated learning (OTA-FL) unifies communication and model aggregation by leveraging the inherent superposition property of the wireless medium. This strategy can enable scalable and bandwidth-efficient learning via simultaneous transmission of model updates using the same frequency resources, if care is exercised to design the physical layer jointly with learning. In this paper, a f… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

    Comments: submitted for an IEEE publication; Nov 2024

  38. arXiv:2412.00017  [pdf, other

    eess.SY

    Frost/Defrost Models for Air-Source Heat Pumps with Retained Water Refreezing Considered

    Authors: Jiacheng Ma, Matthis Thorade

    Abstract: Cyclic frosting and defrosting operations constitute a common characteristic of air-source heat pumps in cold climates during winter. Simulation models that can capture simultaneous heat and mass transfer phenomena associated with frost/defrost behaviors and their impact on the overall heat pump system performance are of critical importance to improved controls of heat delivery and frost mitigatio… ▽ More

    Submitted 15 November, 2024; originally announced December 2024.

  39. arXiv:2411.15447  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    Gotta Hear Them All: Sound Source Aware Vision to Audio Generation

    Authors: Wei Guo, Heng Wang, Jianbo Ma, Weidong Cai

    Abstract: Vision-to-audio (V2A) synthesis has broad applications in multimedia. Recent advancements of V2A methods have made it possible to generate relevant audios from inputs of videos or still images. However, the immersiveness and expressiveness of the generation are limited. One possible problem is that existing methods solely rely on the global scene and overlook details of local sounding objects (i.e… ▽ More

    Submitted 8 March, 2025; v1 submitted 22 November, 2024; originally announced November 2024.

    Comments: 18 pages, 13 figures, source code available at https://github.com/wguo86/SSV2A

  40. arXiv:2411.14360  [pdf, other

    eess.SP

    Integrated Positioning and Communication via LEO Satellites: Opportunities and Challenges

    Authors: Jie Ma, Pinjun Zheng, Xing Liu, Yuchen Zhang, Tareq Y. Al-Naffouri

    Abstract: Low Earth orbit (LEO) satellites, as a prominent technology in the 6G non-terrestrial network, offer both positioning and communication capabilities. While these two applications have each been extensively studied and have achieved substantial progress in recent years, the potential synergistic benefits of integrating them remain an underexplored yet promising avenue. This article comprehensively… ▽ More

    Submitted 30 November, 2024; v1 submitted 21 November, 2024; originally announced November 2024.

  41. arXiv:2411.10492  [pdf, other

    cs.CV eess.IV

    MFP3D: Monocular Food Portion Estimation Leveraging 3D Point Clouds

    Authors: Jinge Ma, Xiaoyan Zhang, Gautham Vinod, Siddeshwar Raghavan, Jiangpeng He, Fengqing Zhu

    Abstract: Food portion estimation is crucial for monitoring health and tracking dietary intake. Image-based dietary assessment, which involves analyzing eating occasion images using computer vision techniques, is increasingly replacing traditional methods such as 24-hour recalls. However, accurately estimating the nutritional content from images remains challenging due to the loss of 3D information when pro… ▽ More

    Submitted 14 November, 2024; originally announced November 2024.

    Comments: 9th International Workshop on Multimedia Assisted Dietary Management, in conjunction with the 27th International Conference on Pattern Recognition (ICPR2024)

  42. arXiv:2411.07573  [pdf, other

    cs.RO eess.SY

    Robotic Control Optimization Through Kernel Selection in Safe Bayesian Optimization

    Authors: Lihao Zheng, Hongxuan Wang, Xiaocong Li, Jun Ma, Prahlad Vadakkepat

    Abstract: Control system optimization has long been a fundamental challenge in robotics. While recent advancements have led to the development of control algorithms that leverage learning-based approaches, such as SafeOpt, to optimize single feedback controllers, scaling these methods to high-dimensional complex systems with multiple controllers remains an open problem. In this paper, we propose a novel lea… ▽ More

    Submitted 12 November, 2024; originally announced November 2024.

    Comments: Accepted by 2024 IEEE International Conference on Robotics and Biomimetics (ROBIO)

  43. arXiv:2411.05492  [pdf, ps, other

    cs.IT eess.SP math.OC

    Covariance-Based Device Activity Detection with Massive MIMO for Near-Field Correlated Channels

    Authors: Ziyue Wang, Yang Li, Ya-Feng Liu, Junjie Ma

    Abstract: This paper studies the device activity detection problem in a massive multiple-input multiple-output (MIMO) system for near-field communications (NFC). In this system, active devices transmit their signature sequences to the base station (BS), which detects the active devices based on the received signal. In this paper, we model the near-field channels as correlated Rician fading channels and form… ▽ More

    Submitted 8 November, 2024; originally announced November 2024.

    Comments: 15 pages, 8 figures, submitted for possible publication

  44. arXiv:2410.22830  [pdf, other

    eess.IV cs.CV

    Latent Diffusion, Implicit Amplification: Efficient Continuous-Scale Super-Resolution for Remote Sensing Images

    Authors: Hanlin Wu, Jiangwei Mo, Xiaohui Sun, Jie Ma

    Abstract: Recent advancements in diffusion models have significantly improved performance in super-resolution (SR) tasks. However, previous research often overlooks the fundamental differences between SR and general image generation. General image generation involves creating images from scratch, while SR focuses specifically on enhancing existing low-resolution (LR) images by adding typically missing high-… ▽ More

    Submitted 30 October, 2024; originally announced October 2024.

  45. arXiv:2410.19383  [pdf, other

    eess.SP

    A Modulo Sampling Hardware Prototype and Reconstruction Algorithm Evaluation

    Authors: Jiang Zhu, Junnan Ma, Zhenlong Liu, Fengzhong Qu, Zheng Zhu, Qi Zhang

    Abstract: Analog-to-digital converters (ADCs) play a vital important role in any devices via manipulating analog signals in a digital manner. Given that the amplitude of the signal exceeds the dynamic range of the ADCs, clipping occurs and the quality of the digitized signal degrades significantly. In this paper, we design a joint modulo sampling hardware and processing prototype which improves the ADCs' dy… ▽ More

    Submitted 25 October, 2024; originally announced October 2024.

  46. arXiv:2410.17084  [pdf, other

    cs.RO eess.IV

    GS-LIVM: Real-Time Photo-Realistic LiDAR-Inertial-Visual Mapping with Gaussian Splatting

    Authors: Yusen Xie, Zhenmin Huang, Jin Wu, Jun Ma

    Abstract: In this paper, we introduce GS-LIVM, a real-time photo-realistic LiDAR-Inertial-Visual mapping framework with Gaussian Splatting tailored for outdoor scenes. Compared to existing methods based on Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), our approach enables real-time photo-realistic mapping while ensuring high-quality image rendering in large-scale unbounded outdoor environm… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

    Comments: 15 pages, 13 figures

  47. arXiv:2410.12419  [pdf, other

    eess.IV cs.CV

    Mind the Context: Attention-Guided Weak-to-Strong Consistency for Enhanced Semi-Supervised Medical Image Segmentation

    Authors: Yuxuan Cheng, Chenxi Shao, Jie Ma, Yunfei Xie, Guoliang Li

    Abstract: Medical image segmentation is a pivotal step in diagnostic and therapeutic processes, relying on high-quality annotated data that is often challenging and costly to obtain. Semi-supervised learning offers a promising approach to enhance model performance by leveraging unlabeled data. Although weak-to-strong consistency is a prevalent method in semi-supervised image segmentation, there is a scarcit… ▽ More

    Submitted 7 July, 2025; v1 submitted 16 October, 2024; originally announced October 2024.

  48. arXiv:2410.11299  [pdf, other

    cs.SD eess.AS

    Diff-SAGe: End-to-End Spatial Audio Generation Using Diffusion Models

    Authors: Saksham Singh Kushwaha, Jianbo Ma, Mark R. P. Thomas, Yapeng Tian, Avery Bruni

    Abstract: Spatial audio is a crucial component in creating immersive experiences. Traditional simulation-based approaches to generate spatial audio rely on expertise, have limited scalability, and assume independence between semantic and spatial information. To address these issues, we explore end-to-end spatial audio generation. We introduce and formulate a new task of generating first-order Ambisonics (FO… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  49. arXiv:2409.14342  [pdf, other

    cs.RO eess.SY

    Adapting Gait Frequency for Posture-regulating Humanoid Push-recovery via Hierarchical Model Predictive Control

    Authors: Junheng Li, Zhanhao Le, Junchao Ma, Quan Nguyen

    Abstract: Current humanoid push-recovery strategies often use whole-body motion, yet they tend to overlook posture regulation. For instance, in manipulation tasks, the upper body may need to stay upright and have minimal recovery displacement. This paper introduces a novel approach to enhancing humanoid push-recovery performance under unknown disturbances and regulating body posture by tailoring the recover… ▽ More

    Submitted 27 May, 2025; v1 submitted 22 September, 2024; originally announced September 2024.

    Comments: 7 pages, 6 figures, accepted to ICRA 2025

  50. arXiv:2409.10310  [pdf, other

    cs.RO eess.SY

    Safe and Real-Time Consistent Planning for Autonomous Vehicles in Partially Observed Environments via Parallel Consensus Optimization

    Authors: Lei Zheng, Rui Yang, Minzhe Zheng, Michael Yu Wang, Jun Ma

    Abstract: Ensuring safety and driving consistency is a significant challenge for autonomous vehicles operating in partially observed environments. This work introduces a consistent parallel trajectory optimization (CPTO) approach to enable safe and consistent driving in dense obstacle environments with perception uncertainties. Utilizing discrete-time barrier function theory, we develop a consensus safety b… ▽ More

    Submitted 16 September, 2024; originally announced September 2024.