Skip to main content

Showing 1–50 of 197 results for author: Xu, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.21171  [pdf, ps, other

    eess.IV cs.CV

    Uncover Treasures in DCT: Advancing JPEG Quality Enhancement by Exploiting Latent Correlations

    Authors: Jing Yang, Qunliang Xing, Mai Xu, Minglang Qiao

    Abstract: Joint Photographic Experts Group (JPEG) achieves data compression by quantizing Discrete Cosine Transform (DCT) coefficients, which inevitably introduces compression artifacts. Most existing JPEG quality enhancement methods operate in the pixel domain, suffering from the high computational costs of decoding. Consequently, direct enhancement of JPEG images in the DCT domain has gained increasing at… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  2. arXiv:2506.20282  [pdf, ps, other

    eess.IV cs.CV

    Opportunistic Osteoporosis Diagnosis via Texture-Preserving Self-Supervision, Mixture of Experts and Multi-Task Integration

    Authors: Jiaxing Huang, Heng Guo, Le Lu, Fan Yang, Minfeng Xu, Ge Yang, Wei Luo

    Abstract: Osteoporosis, characterized by reduced bone mineral density (BMD) and compromised bone microstructure, increases fracture risk in aging populations. While dual-energy X-ray absorptiometry (DXA) is the clinical standard for BMD assessment, its limited accessibility hinders diagnosis in resource-limited regions. Opportunistic computed tomography (CT) analysis has emerged as a promising alternative f… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: Accepted by MICCAI 2025

  3. arXiv:2506.17924  [pdf, ps, other

    math.OC eess.SY

    Inverse Chance Constrained Optimal Power Flow

    Authors: Shenglu Wang, Kairui Feng, Mengqi Xue, Yue Song

    Abstract: The chance constrained optimal power flow (CC-OPF) essentially finds the low-cost generation dispatch scheme ensuring operational constraints are met with a specified probability, termed the security level. While the security level is a crucial input parameter, how it shapes the CC-OPF feasibility boundary has not been revealed. Changing the security level from a parameter to a decision variable,… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: 3 pages, 1 figure

  4. arXiv:2506.14152  [pdf, ps, other

    eess.IV

    Breaking the Multi-Enhancement Bottleneck: Domain-Consistent Quality Enhancement for Compressed Images

    Authors: Qunliang Xing, Mai Xu, Jing Yang, Shengxi Li

    Abstract: Quality enhancement methods have been widely integrated into visual communication pipelines to mitigate artifacts in compressed images. Ideally, these quality enhancement methods should perform robustly when applied to images that have already undergone prior enhancement during transmission. We refer to this scenario as multi-enhancement, which generalizes the well-known multi-generation scenario… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  5. arXiv:2506.05854  [pdf, ps, other

    eess.SY

    Towards Next-Generation Intelligent Maintenance: Collaborative Fusion of Large and Small Models

    Authors: Xiaoyi Yuan, Qiming Huang, Mingqing Guo, Huiming Ma, Ming Xu, Zeyi Liu, Xiao He

    Abstract: With the rapid advancement of intelligent technologies, collaborative frameworks integrating large and small models have emerged as a promising approach for enhancing industrial maintenance. However, several challenges persist, including limited domain adaptability, insufficient real-time performance and reliability, high integration complexity, and difficulties in knowledge representation and fus… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: 6 pages, 5 figures, Accepted by the 2025 CAA Symposium on Fault Detection, Supervision and Safety for Technical Processes (SAFEPROCESS 2025)

  6. arXiv:2505.23706  [pdf, ps, other

    cs.NI cs.AI cs.DC cs.IT eess.SP

    Distributed Federated Learning for Vehicular Network Security: Anomaly Detection Benefits and Multi-Domain Attack Threats

    Authors: Utku Demir, Yalin E. Sagduyu, Tugba Erpek, Hossein Jafari, Sastry Kompella, Mengran Xue

    Abstract: In connected and autonomous vehicles, machine learning for safety message classification has become critical for detecting malicious or anomalous behavior. However, conventional approaches that rely on centralized data collection or purely local training face limitations due to the large scale, high mobility, and heterogeneous data distributions inherent in inter-vehicle networks. To overcome thes… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  7. arXiv:2505.13062  [pdf, other

    cs.MM cs.SD eess.AS

    Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model

    Authors: Yong Ren, Chenxing Li, Le Xu, Hao Gu, Duzhen Zhang, Yujie Chen, Manjie Xu, Ruibo Fu, Shan Yang, Dong Yu

    Abstract: Humans can intuitively infer sounds from silent videos, but whether multimodal large language models can perform modal-mismatch reasoning without accessing target modalities remains relatively unexplored. Current text-assisted-video-to-audio (VT2A) methods excel in video foley tasks but struggle to acquire audio descriptions during inference. We introduce the task of Reasoning Audio Descriptions f… ▽ More

    Submitted 27 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech 2025

  8. arXiv:2505.04068  [pdf, other

    cs.NI eess.SP

    Shadow Wireless Intelligence: Large Language Model-Driven Reasoning in Covert Communications

    Authors: Yuanai Xie, Zhaozhi Liu, Xiao Zhang, Shihua Zhang, Rui Hou, Minrui Xu, Ruichen Zhang, Dusit Niyato

    Abstract: Covert Communications (CC) can secure sensitive transmissions in industrial, military, and mission-critical applications within 6G wireless networks. However, traditional optimization methods based on Artificial Noise (AN), power control, and channel manipulation might not adapt to dynamic and adversarial environments due to the high dimensionality, nonlinearity, and stringent real-time covertness… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  9. arXiv:2504.08278  [pdf, ps, other

    math.OC cs.RO eess.SY

    Interior Point Differential Dynamic Programming, Redux

    Authors: Ming Xu, Stephen Gould, Iman Shames

    Abstract: We present IPDDP2, a structure-exploiting algorithm for solving discrete-time, finite-horizon optimal control problems (OCPs) with nonlinear constraints. Inequality constraints are handled using a primal-dual interior point formulation and step acceptance for equality constraints follows a line-search filter approach. The iterates of the algorithm are derived under the Differential Dynamic Program… ▽ More

    Submitted 13 June, 2025; v1 submitted 11 April, 2025; originally announced April 2025.

  10. arXiv:2504.01038  [pdf, other

    eess.IV cs.CV cs.HC

    An Integrated AI-Enabled System Using One Class Twin Cross Learning (OCT-X) for Early Gastric Cancer Detection

    Authors: Xian-Xian Liu, Yuanyuan Wei, Mingkun Xu, Yongze Guo, Hongwei Zhang, Huicong Dong, Qun Song, Qi Zhao, Wei Luo, Feng Tien, Juntao Gao, Simon Fong

    Abstract: Early detection of gastric cancer, a leading cause of cancer-related mortality worldwide, remains hampered by the limitations of current diagnostic technologies, leading to high rates of misdiagnosis and missed diagnoses. To address these challenges, we propose an integrated system that synergizes advanced hardware and software technologies to balance speed-accuracy. Our study introduces the One C… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

    Comments: 26 pages, 4 figures, 6 tables

  11. arXiv:2503.12698  [pdf, other

    eess.IV cs.CV

    A Continual Learning-driven Model for Accurate and Generalizable Segmentation of Clinically Comprehensive and Fine-grained Whole-body Anatomies in CT

    Authors: Dazhou Guo, Zhanghexuan Ji, Yanzhou Su, Dandan Zheng, Heng Guo, Puyang Wang, Ke Yan, Yirui Wang, Qinji Yu, Zi Li, Minfeng Xu, Jianfeng Zhang, Haoshen Li, Jia Ge, Tsung-Ying Ho, Bing-Shen Huang, Tashan Ai, Kuaile Zhao, Na Shen, Qifeng Wang, Yun Bian, Tingyu Wu, Peng Du, Hua Zhang, Feng-Ming Kong , et al. (9 additional authors not shown)

    Abstract: Precision medicine in the quantitative management of chronic diseases and oncology would be greatly improved if the Computed Tomography (CT) scan of any patient could be segmented, parsed and analyzed in a precise and detailed way. However, there is no such fully annotated CT dataset with all anatomies delineated for training because of the exceptionally high manual cost, the need for specialized… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  12. arXiv:2503.05107  [pdf, ps, other

    eess.IV cs.CV

    We Care Each Pixel: Calibrating on Medical Segmentation Model

    Authors: Wenhao Liang, Wei Zhang, Lin Yue, Miao Xu, Olaf Maennel, Weitong Chen

    Abstract: Medical image segmentation is fundamental for computer-aided diagnostics, providing accurate delineation of anatomical structures and pathological regions. While common metrics such as Accuracy, DSC, IoU, and HD primarily quantify spatial agreement between predictions and ground-truth labels, they do not assess the calibration quality of segmentation models, which is crucial for clinical reliabili… ▽ More

    Submitted 13 June, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

    Comments: Under Reviewing

  13. arXiv:2503.00011  [pdf, other

    eess.SP

    Fluid Antenna Enabled Over-the-Air Federated Learning: Joint Optimization of Positioning, Beamforming, and User Selection

    Authors: Yang Zhao, Minrui Xu, Ping Wang, Dusit Niyato

    Abstract: Over-the-air (OTA) federated learning (FL) effectively utilizes communication bandwidth, yet it is vulnerable to errors during analog aggregation. While removing users with unfavorable channel conditions can mitigate these errors, it also reduces the available local training data for FL, which in turn hinders the convergence rate of the training process. To tackle this issue, we propose using flui… ▽ More

    Submitted 17 February, 2025; originally announced March 2025.

  14. arXiv:2502.05713  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    4D VQ-GAN: Synthesising Medical Scans at Any Time Point for Personalised Disease Progression Modelling of Idiopathic Pulmonary Fibrosis

    Authors: An Zhao, Moucheng Xu, Ahmed H. Shahin, Wim Wuyts, Mark G. Jones, Joseph Jacob, Daniel C. Alexander

    Abstract: Understanding the progression trajectories of diseases is crucial for early diagnosis and effective treatment planning. This is especially vital for life-threatening conditions such as Idiopathic Pulmonary Fibrosis (IPF), a chronic, progressive lung disease with a prognosis comparable to many cancers. Computed tomography (CT) imaging has been established as a reliable diagnostic tool for IPF. Accu… ▽ More

    Submitted 8 February, 2025; originally announced February 2025.

    Comments: 4D image synthesis, VQ-GAN, neural ODEs, spatial temporal disease progression modelling, CT, IPF

  15. arXiv:2502.01108  [pdf, other

    cs.LG cs.AI eess.SP

    Pulse-PPG: An Open-Source Field-Trained PPG Foundation Model for Wearable Applications Across Lab and Field Settings

    Authors: Mithun Saha, Maxwell A. Xu, Wanting Mao, Sameer Neupane, James M. Rehg, Santosh Kumar

    Abstract: Photoplethysmography (PPG)-based foundation models are gaining traction due to the widespread use of PPG in biosignal monitoring and their potential to generalize across diverse health applications. In this paper, we introduce Pulse-PPG, the first open-source PPG foundation model trained exclusively on raw PPG data collected over a 100-day field study with 120 participants. Existing PPG foundation… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

    Comments: The first two listed authors contributed equally to this research

  16. arXiv:2501.15588  [pdf, other

    eess.IV cs.CV

    Tumor Detection, Segmentation and Classification Challenge on Automated 3D Breast Ultrasound: The TDSC-ABUS Challenge

    Authors: Gongning Luo, Mingwang Xu, Hongyu Chen, Xinjie Liang, Xing Tao, Dong Ni, Hyunsu Jeong, Chulhong Kim, Raphael Stock, Michael Baumgartner, Yannick Kirchhoff, Maximilian Rokuss, Klaus Maier-Hein, Zhikai Yang, Tianyu Fan, Nicolas Boutry, Dmitry Tereshchenko, Arthur Moine, Maximilien Charmetant, Jan Sauer, Hao Du, Xiang-Hui Bai, Vipul Pai Raikar, Ricardo Montoya-del-Angel, Robert Marti , et al. (12 additional authors not shown)

    Abstract: Breast cancer is one of the most common causes of death among women worldwide. Early detection helps in reducing the number of deaths. Automated 3D Breast Ultrasound (ABUS) is a newer approach for breast screening, which has many advantages over handheld mammography such as safety, speed, and higher detection rate of breast cancer. Tumor detection, segmentation, and classification are key componen… ▽ More

    Submitted 26 January, 2025; originally announced January 2025.

  17. arXiv:2412.20685  [pdf, other

    eess.IV cs.MM

    MarsSQE: Stereo Quality Enhancement for Martian Images Using Bi-level Cross-view Attention

    Authors: Mai Xu, Yinglin Zhu, Qunliang Xing, Jing Yang, Xin Zou

    Abstract: Stereo images captured by Mars rovers are transmitted after lossy compression due to the limited bandwidth between Mars and Earth. Unfortunately, this process results in undesirable compression artifacts. In this paper, we present a novel stereo quality enhancement approach for Martian images, named MarsSQE. First, we establish the first dataset of stereo Martian images. Through extensive analysis… ▽ More

    Submitted 29 December, 2024; originally announced December 2024.

  18. arXiv:2412.17211  [pdf, other

    eess.SP

    Joint Multitarget Detection and Tracking with mmWave Radar

    Authors: Jiang Zhu, Menghuai Xu, Ruohai Guo, Fangyong Wang, Guangying Zheng, Fengzhong Qu

    Abstract: Accurate targets detection and tracking with mmWave radar is a key sensing capability that will enable more intelligent systems, create smart, efficient, automated system. This paper proposes an end-to-end detection-estimation-track framework named MNOMP-SPA-KF consisting of the target detection and estimation module, the data association (DA) module and the target tracking module. In the target e… ▽ More

    Submitted 22 December, 2024; originally announced December 2024.

  19. arXiv:2412.15622  [pdf, other

    eess.AS cs.CL eess.SP

    TouchASP: Elastic Automatic Speech Perception that Everyone Can Touch

    Authors: Xingchen Song, Chengdong Liang, Binbin Zhang, Pengshen Zhang, ZiYu Wang, Youcheng Ma, Menglong Xu, Lin Wang, Di Wu, Fuping Pan, Dinghao Zhou, Zhendong Peng

    Abstract: Large Automatic Speech Recognition (ASR) models demand a vast number of parameters, copious amounts of data, and significant computational resources during the training process. However, such models can merely be deployed on high-compute cloud platforms and are only capable of performing speech recognition tasks. This leads to high costs and restricted capabilities. In this report, we initially pr… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

    Comments: Technical Report

  20. arXiv:2412.15596  [pdf, other

    eess.SP

    Resonant Beam Enabled Passive 3D Positioning

    Authors: Yixuan Guo, Mingliang Xiong, Wen Fang, Qingwei Jiang, Mengyuan Xu, Qingwen Liu, Gang Yan

    Abstract: With the rapid development of the internet of things (IoT), location-based services are becoming increasingly prominent in various aspects of social life, and accurate location information is crucial. However, RF-based indoor positioning solutions are severely limited in positioning accuracy due to signal transmission losses and directional difficulties, and optical indoor positioning methods requ… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

  21. arXiv:2412.06011  [pdf, other

    eess.IV cs.CV

    TopoCellGen: Generating Histopathology Cell Topology with a Diffusion Model

    Authors: Meilong Xu, Saumya Gupta, Xiaoling Hu, Chen Li, Shahira Abousamra, Dimitris Samaras, Prateek Prasanna, Chao Chen

    Abstract: Accurately modeling multi-class cell topology is crucial in digital pathology, as it provides critical insights into tissue structure and pathology. The synthetic generation of cell topology enables realistic simulations of complex tissue environments, enhances downstream tasks by augmenting training data, aligns more closely with pathologists' domain knowledge, and offers new opportunities for co… ▽ More

    Submitted 24 March, 2025; v1 submitted 8 December, 2024; originally announced December 2024.

    Comments: Accepted by CVPR 2025. 15 pages, 8 figures

  22. arXiv:2411.18822  [pdf, other

    eess.SP cs.AI cs.LG

    RelCon: Relative Contrastive Learning for a Motion Foundation Model for Wearable Data

    Authors: Maxwell A. Xu, Jaya Narain, Gregory Darnell, Haraldur Hallgrimsson, Hyewon Jeong, Darren Forde, Richard Fineman, Karthik J. Raghuram, James M. Rehg, Shirley Ren

    Abstract: We present RelCon, a novel self-supervised Relative Contrastive learning approach for training a motion foundation model from wearable accelerometry sensors. First, a learnable distance measure is trained to capture motif similarity and domain-specific semantic information such as rotation invariance. Then, the learned distance provides a measurement of semantic similarity between a pair of accele… ▽ More

    Submitted 10 April, 2025; v1 submitted 27 November, 2024; originally announced November 2024.

    Comments: Accepted to ICLR 2025. Code here: https://github.com/maxxu05/relcon

    Journal ref: The Thirteenth International Conference on Learning Representations (ICLR), 2025

  23. arXiv:2411.18369  [pdf, ps, other

    cs.RO cs.AI cs.CV eess.SY

    G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation

    Authors: Tianxing Chen, Yao Mu, Zhixuan Liang, Zanxin Chen, Shijia Peng, Qiangyu Chen, Mingkun Xu, Ruizhen Hu, Hongyuan Zhang, Xuelong Li, Ping Luo

    Abstract: Recent advances in imitation learning for 3D robotic manipulation have shown promising results with diffusion-based policies. However, achieving human-level dexterity requires seamless integration of geometric precision and semantic understanding. We present G3Flow, a novel framework that constructs real-time semantic flow, a dynamic, object-centric 3D semantic representation by leveraging foundat… ▽ More

    Submitted 21 June, 2025; v1 submitted 27 November, 2024; originally announced November 2024.

    Comments: Webpage: https://tianxingchen.github.io/G3Flow/, accepted to CVPR 2025

  24. arXiv:2411.18266  [pdf

    eess.AS cs.AI cs.SD eess.SY

    Wearable intelligent throat enables natural speech in stroke patients with dysarthria

    Authors: Chenyu Tang, Shuo Gao, Cong Li, Wentian Yi, Yuxuan Jin, Xiaoxue Zhai, Sixuan Lei, Hongbei Meng, Zibo Zhang, Muzi Xu, Shengbo Wang, Xuhang Chen, Chenxi Wang, Hongyun Yang, Ningli Wang, Wenyu Wang, Jin Cao, Xiaodong Feng, Peter Smielewski, Yu Pan, Wenhui Song, Martin Birchall, Luigi G. Occhipinti

    Abstract: Wearable silent speech systems hold significant potential for restoring communication in patients with speech impairments. However, seamless, coherent speech remains elusive, and clinical efficacy is still unproven. Here, we present an AI-driven intelligent throat (IT) system that integrates throat muscle vibrations and carotid pulse signal sensors with large language model (LLM) processing to ena… ▽ More

    Submitted 14 March, 2025; v1 submitted 27 November, 2024; originally announced November 2024.

    Comments: 5 figures, 45 references

  25. arXiv:2411.15248  [pdf, other

    eess.IV cs.CV

    J-Invariant Volume Shuffle for Self-Supervised Cryo-Electron Tomogram Denoising on Single Noisy Volume

    Authors: Xiwei Liu, Mohamad Kassab, Min Xu, Qirong Ho

    Abstract: Cryo-Electron Tomography (Cryo-ET) enables detailed 3D visualization of cellular structures in near-native states but suffers from low signal-to-noise ratio due to imaging constraints. Traditional denoising methods and supervised learning approaches often struggle with complex noise patterns and the lack of paired datasets. Self-supervised methods, which utilize noisy input itself as a target, hav… ▽ More

    Submitted 22 February, 2025; v1 submitted 22 November, 2024; originally announced November 2024.

    Comments: 10 pages, 7 figures, 7 tables

  26. arXiv:2411.15076  [pdf, other

    eess.IV cs.CV q-bio.QM

    RankByGene: Gene-Guided Histopathology Representation Learning Through Cross-Modal Ranking Consistency

    Authors: Wentao Huang, Meilong Xu, Xiaoling Hu, Shahira Abousamra, Aniruddha Ganguly, Saarthak Kapse, Alisa Yurovsky, Prateek Prasanna, Tahsin Kurc, Joel Saltz, Michael L. Miller, Chao Chen

    Abstract: Spatial transcriptomics (ST) provides essential spatial context by mapping gene expression within tissue, enabling detailed study of cellular heterogeneity and tissue organization. However, aligning ST data with histology images poses challenges due to inherent spatial distortions and modality-specific variations. Existing methods largely rely on direct alignment, which often fails to capture comp… ▽ More

    Submitted 22 March, 2025; v1 submitted 22 November, 2024; originally announced November 2024.

    Comments: 18 pages, 9 figures

  27. arXiv:2411.14385  [pdf, other

    eess.IV cs.CV

    Enhancing Diagnostic Precision in Gastric Bleeding through Automated Lesion Segmentation: A Deep DuS-KFCM Approach

    Authors: Xian-Xian Liu, Mingkun Xu, Yuanyuan Wei, Huafeng Qin, Qun Song, Simon Fong, Feng Tien, Wei Luo, Juntao Gao, Zhihua Zhang, Shirley Siu

    Abstract: Timely and precise classification and segmentation of gastric bleeding in endoscopic imagery are pivotal for the rapid diagnosis and intervention of gastric complications, which is critical in life-saving medical procedures. Traditional methods grapple with the challenge posed by the indistinguishable intensity values of bleeding tissues adjacent to other gastric structures. Our study seeks to rev… ▽ More

    Submitted 25 November, 2024; v1 submitted 21 November, 2024; originally announced November 2024.

  28. arXiv:2411.10772  [pdf, other

    eess.IV cs.AI cs.CV cs.LG stat.ML

    MRI Parameter Mapping via Gaussian Mixture VAE: Breaking the Assumption of Independent Pixels

    Authors: Moucheng Xu, Yukun Zhou, Tobias Goodwin-Allcock, Kimia Firoozabadi, Joseph Jacob, Daniel C. Alexander, Paddy J. Slator

    Abstract: We introduce and demonstrate a new paradigm for quantitative parameter mapping in MRI. Parameter mapping techniques, such as diffusion MRI and quantitative MRI, have the potential to robustly and repeatably measure biologically-relevant tissue maps that strongly relate to underlying microstructure. Quantitative maps are calculated by fitting a model to multiple images, e.g. with least-squares or m… ▽ More

    Submitted 16 November, 2024; originally announced November 2024.

    Comments: NeurIPS 2024 Workshop in Machine Learning and the Physical Sciences

  29. arXiv:2411.09593  [pdf, other

    eess.IV cs.AI cs.CV

    SMILE-UHURA Challenge -- Small Vessel Segmentation at Mesoscopic Scale from Ultra-High Resolution 7T Magnetic Resonance Angiograms

    Authors: Soumick Chatterjee, Hendrik Mattern, Marc Dörner, Alessandro Sciarra, Florian Dubost, Hannes Schnurre, Rupali Khatun, Chun-Chih Yu, Tsung-Lin Hsieh, Yi-Shan Tsai, Yi-Zeng Fang, Yung-Ching Yang, Juinn-Dar Huang, Marshall Xu, Siyu Liu, Fernanda L. Ribeiro, Saskia Bollmann, Karthikesh Varma Chintalapati, Chethan Mysuru Radhakrishna, Sri Chandana Hudukula Ram Kumara, Raviteja Sutrave, Abdul Qayyum, Moona Mazher, Imran Razzak, Cristobal Rodero , et al. (23 additional authors not shown)

    Abstract: The human brain receives nutrients and oxygen through an intricate network of blood vessels. Pathology affecting small vessels, at the mesoscopic scale, represents a critical vulnerability within the cerebral blood supply and can lead to severe conditions, such as Cerebral Small Vessel Diseases. The advent of 7 Tesla MRI systems has enabled the acquisition of higher spatial resolution images, maki… ▽ More

    Submitted 14 November, 2024; originally announced November 2024.

  30. arXiv:2411.07069  [pdf, other

    eess.SY

    Two-Stage Stochastic Optimization for Low-Carbon Dispatch in a Combined Energy System

    Authors: Manling Hu, Manqi Xu, Dunnan Liu

    Abstract: While wind and solar power contribute to sustainability, their intermittent nature poses challenges when integrated into the grid. To mitigate these issues, renewable energy can be combined with coal fired power and hydropower sources to stabilize the energy system, with battery storage serving as a backup source to smooth the total output. This study develops a low carbon dispatch model for a com… ▽ More

    Submitted 11 November, 2024; originally announced November 2024.

    Comments: 5 pages, 5 figures, accepted for publication in The 8th IEEE Conference on Energy Internet and Energy System Integration

  31. arXiv:2411.06721  [pdf, other

    eess.SP

    Movable Antenna-Aided Federated Learning with Over-the-Air Aggregation: Joint Optimization of Positioning, Beamforming, and User Selection

    Authors: Yang Zhao, Yue Xiu, Minrui Xu, Ning Wei

    Abstract: Federated learning (FL) in wireless computing effectively utilizes communication bandwidth, yet it is vulnerable to errors during the analog aggregation process. While removing users with unfavorable channel conditions can mitigate these errors, it also reduces the available local training data for FL, which in turn hinders the convergence rate of the training process. To tackle this issue, we pro… ▽ More

    Submitted 21 April, 2025; v1 submitted 11 November, 2024; originally announced November 2024.

  32. arXiv:2411.05959  [pdf, other

    eess.IV cs.CV

    Efficient Self-Supervised Barlow Twins from Limited Tissue Slide Cohorts for Colonic Pathology Diagnostics

    Authors: Cassandre Notton, Vasudev Sharma, Vincent Quoc-Huy Trinh, Lina Chen, Minqi Xu, Sonal Varma, Mahdi S. Hosseini

    Abstract: Colorectal cancer (CRC) is one of the few cancers that have an established dysplasia-carcinoma sequence that benefits from screening. Everyone over 50 years of age in Canada is eligible for CRC screening. About 20\% of those people will undergo a biopsy for a pre-neoplastic polyp and, in many cases, multiple polyps. As such, these polyp biopsies make up the bulk of a pathologist's workload. Develo… ▽ More

    Submitted 8 November, 2024; originally announced November 2024.

    Comments: Submission Under Review

  33. arXiv:2411.02089  [pdf, other

    eess.SY

    Bidding and Dispatch Strategies with Flexibility Quantification and Pricing for Electric Vehicle Aggregator in Joint Energy-Regulation Market

    Authors: Manqi Xu, Ye Guo, Hongbin Sun

    Abstract: Managing and unlocking the flexibility hidden in electric vehicles (EVs) has emerged as a critical yet challenging task towards low-carbon power and energy systems. This paper focuses on the online bidding and dispatch strategies for an EV aggregator (EVA) in a joint energy-regulation market, considering EVs' flexibility contributions and compensations. A method for quantifying EV flexibility as a… ▽ More

    Submitted 4 November, 2024; originally announced November 2024.

  34. arXiv:2410.18610  [pdf, other

    eess.IV cs.CV

    A Joint Representation Using Continuous and Discrete Features for Cardiovascular Diseases Risk Prediction on Chest CT Scans

    Authors: Minfeng Xu, Chen-Chen Fan, Yan-Jie Zhou, Wenchao Guo, Pan Liu, Jing Qi, Le Lu, Hanqing Chao, Kunlun He

    Abstract: Cardiovascular diseases (CVD) remain a leading health concern and contribute significantly to global mortality rates. While clinical advancements have led to a decline in CVD mortality, accurately identifying individuals who could benefit from preventive interventions remains an unsolved challenge in preventive cardiology. Current CVD risk prediction models, recommended by guidelines, are based on… ▽ More

    Submitted 15 November, 2024; v1 submitted 24 October, 2024; originally announced October 2024.

    Comments: 23 pages, 9 figures

  35. arXiv:2410.06544  [pdf, other

    cs.SD eess.AS

    SRC-gAudio: Sampling-Rate-Controlled Audio Generation

    Authors: Chenxing Li, Manjie Xu, Dong Yu

    Abstract: We introduce SRC-gAudio, a novel audio generation model designed to facilitate text-to-audio generation across a wide range of sampling rates within a single model architecture. SRC-gAudio incorporates the sampling rate as part of the generation condition to guide the diffusion-based audio generation process. Our model enables the generation of audio at multiple sampling rates with a single unifie… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

    Comments: Accepted by APSIPA2024

  36. arXiv:2410.04847  [pdf, other

    eess.IV cs.CV

    Causal Context Adjustment Loss for Learned Image Compression

    Authors: Minghao Han, Shiyin Jiang, Shengxi Li, Xin Deng, Mai Xu, Ce Zhu, Shuhang Gu

    Abstract: In recent years, learned image compression (LIC) technologies have surpassed conventional methods notably in terms of rate-distortion (RD) performance. Most present learned techniques are VAE-based with an autoregressive entropy model, which obviously promotes the RD performance by utilizing the decoded causal context. However, extant methods are highly dependent on the fixed hand-crafted causal c… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

    Comments: Accepted to NeurIPS 2024

  37. arXiv:2410.04128  [pdf, other

    eess.IV cs.CV

    Optimizing Medical Image Segmentation with Advanced Decoder Design

    Authors: Weibin Yang, Zhiqi Dong, Mingyuan Xu, Longwei Xu, Dehua Geng, Yusong Li, Pengwei Wang

    Abstract: U-Net is widely used in medical image segmentation due to its simple and flexible architecture design. To address the challenges of scale and complexity in medical tasks, several variants of U-Net have been proposed. In particular, methods based on Vision Transformer (ViT), represented by Swin UNETR, have gained widespread attention in recent years. However, these improvements often focus on the e… ▽ More

    Submitted 5 October, 2024; originally announced October 2024.

  38. arXiv:2410.01469  [pdf, other

    cs.SD cs.AI eess.AS

    TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation

    Authors: Mohan Xu, Kai Li, Guo Chen, Xiaolin Hu

    Abstract: In recent years, much speech separation research has focused primarily on improving model performance. However, for low-latency speech processing systems, high efficiency is equally important. Therefore, we propose a speech separation model with significantly reduced parameters and computational costs: Time-frequency Interleaved Gain Extraction and Reconstruction network (TIGER). TIGER leverages p… ▽ More

    Submitted 5 March, 2025; v1 submitted 2 October, 2024; originally announced October 2024.

    Comments: Accepted by ICLR 2025, demo page: https://cslikai.cn/TIGER/

  39. arXiv:2410.00986  [pdf, other

    eess.IV cs.CV

    TransResNet: Integrating the Strengths of ViTs and CNNs for High Resolution Medical Image Segmentation via Feature Grafting

    Authors: Muhammad Hamza Sharif, Dmitry Demidov, Asif Hanif, Mohammad Yaqub, Min Xu

    Abstract: High-resolution images are preferable in medical imaging domain as they significantly improve the diagnostic capability of the underlying method. In particular, high resolution helps substantially in improving automatic image segmentation. However, most of the existing deep learning-based techniques for medical image segmentation are optimized for input images having small spatial dimensions and p… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

    Comments: The 33rd British Machine Vision Conference 2022

  40. arXiv:2409.16063  [pdf, other

    cs.CV eess.IV

    Benchmarking Robustness of Endoscopic Depth Estimation with Synthetically Corrupted Data

    Authors: An Wang, Haochen Yin, Beilei Cui, Mengya Xu, Hongliang Ren

    Abstract: Accurate depth perception is crucial for patient outcomes in endoscopic surgery, yet it is compromised by image distortions common in surgical settings. To tackle this issue, our study presents a benchmark for assessing the robustness of endoscopic depth estimation models. We have compiled a comprehensive dataset that reflects real-world conditions, incorporating a range of synthetically induced c… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

    Comments: To appear at the Simulation and Synthesis in Medical Imaging (SASHIMI) workshop at MICCAI 2024

  41. arXiv:2409.15353  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Contextualization of ASR with LLM using phonetic retrieval-based augmentation

    Authors: Zhihong Lei, Xingyu Na, Mingbin Xu, Ernest Pusateri, Christophe Van Gysel, Yuanyuan Zhang, Shiyi Han, Zhen Huang

    Abstract: Large language models (LLMs) have shown superb capability of modeling multimodal signals including audio and text, allowing the model to generate spoken or textual response given a speech input. However, it remains a challenge for the model to recognize personal named entities, such as contacts in a phone book, when the input modality is speech. In this work, we start with a speech recognition tas… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

  42. arXiv:2409.14418  [pdf, ps, other

    eess.SP

    Delay Minimization for Movable Antennas-Enabled Anti-Jamming Communications With Mobile Edge Computing

    Authors: Yue Xiu, Yang Zhao, Songjie Yang, Minrui Xu, Dusit Niyato, Yueyang Li, Ning Wei

    Abstract: In future 6G networks, anti-jamming will become a critical challenge, particularly with the development of intelligent jammers that can initiate malicious interference, posing a significant security threat to communication transmission. Additionally, 6G networks have introduced mobile edge computing (MEC) technology to reduce system delay for edge user equipment (UEs). Thus, one of the key challen… ▽ More

    Submitted 22 September, 2024; originally announced September 2024.

  43. arXiv:2409.08601  [pdf, other

    cs.SD cs.MM eess.AS

    STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment

    Authors: Yong Ren, Chenxing Li, Manjie Xu, Wei Liang, Yu Gu, Rilin Chen, Dong Yu

    Abstract: Visual and auditory perception are two crucial ways humans experience the world. Text-to-video generation has made remarkable progress over the past year, but the absence of harmonious audio in generated video limits its broader applications. In this paper, we propose Semantic and Temporal Aligned Video-to-Audio (STA-V2A), an approach that enhances audio generation from videos by extracting both l… ▽ More

    Submitted 24 March, 2025; v1 submitted 13 September, 2024; originally announced September 2024.

    Comments: ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  44. arXiv:2409.08585  [pdf, other

    cs.CV eess.IV

    Optimizing 4D Lookup Table for Low-light Video Enhancement via Wavelet Priori

    Authors: Jinhong He, Minglong Xue, Wenhai Wang, Mingliang Zhou

    Abstract: Low-light video enhancement is highly demanding in maintaining spatiotemporal color consistency. Therefore, improving the accuracy of color mapping and keeping the latency low is challenging. Based on this, we propose incorporating Wavelet-priori for 4D Lookup Table (WaveLUT), which effectively enhances the color coherence between video frames and the accuracy of color mapping while maintaining lo… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

  45. Performance Assessment of Feature Detection Methods for 2-D FS Sonar Imagery

    Authors: Hitesh Kyatham, Shahriar Negahdaripour, Michael Xu, Xiaomin Lin, Miao Yu, Yiannis Aloimonos

    Abstract: Underwater robot perception is crucial in scientific subsea exploration and commercial operations. The key challenges include non-uniform lighting and poor visibility in turbid environments. High-frequency forward-look sonar cameras address these issues, by providing high-resolution imagery at maximum range of tens of meters, despite complexities posed by high degree of speckle noise, and lack of… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Journal ref: OCEANS 2024 - Halifax

  46. arXiv:2409.01695  [pdf, other

    cs.SD cs.AI eess.AS

    USTC-KXDIGIT System Description for ASVspoof5 Challenge

    Authors: Yihao Chen, Haochen Wu, Nan Jiang, Xiang Xia, Qing Gu, Yunqi Hao, Pengfei Cai, Yu Guan, Jialong Wang, Weilin Xie, Lei Fang, Sian Fang, Yan Song, Wu Guo, Lin Liu, Minqiang Xu

    Abstract: This paper describes the USTC-KXDIGIT system submitted to the ASVspoof5 Challenge for Track 1 (speech deepfake detection) and Track 2 (spoofing-robust automatic speaker verification, SASV). Track 1 showcases a diverse range of technical qualities from potential processing algorithms and includes both open and closed conditions. For these conditions, our system consists of a cascade of a frontend f… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: ASVspoof5 workshop paper

  47. arXiv:2409.00481  [pdf, other

    eess.AS cs.SD

    DCIM-AVSR : Efficient Audio-Visual Speech Recognition via Dual Conformer Interaction Module

    Authors: Xinyu Wang, Haotian Jiang, Haolin Huang, Yu Fang, Mengjie Xu, Qian Wang

    Abstract: Speech recognition is the technology that enables machines to interpret and process human speech, converting spoken language into text or commands. This technology is essential for applications such as virtual assistants, transcription services, and communication tools. The Audio-Visual Speech Recognition (AVSR) model enhances traditional speech recognition, particularly in noisy environments, by… ▽ More

    Submitted 8 January, 2025; v1 submitted 31 August, 2024; originally announced September 2024.

    Comments: Accepted to ICASSP 2025

  48. arXiv:2408.15585  [pdf, other

    cs.SD eess.AS

    Whisper-PMFA: Partial Multi-Scale Feature Aggregation for Speaker Verification using Whisper Models

    Authors: Yiyang Zhao, Shuai Wang, Guangzhi Sun, Zehua Chen, Chao Zhang, Mingxing Xu, Thomas Fang Zheng

    Abstract: In this paper, Whisper, a large-scale pre-trained model for automatic speech recognition, is proposed to apply to speaker verification. A partial multi-scale feature aggregation (PMFA) approach is proposed based on a subset of Whisper encoder blocks to derive highly discriminative speaker embeddings.Experimental results demonstrate that using the middle to later blocks of the Whisper encoder keeps… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: Accepted by Interspeech 2024

  49. A Joint Noise Disentanglement and Adversarial Training Framework for Robust Speaker Verification

    Authors: Xujiang Xing, Mingxing Xu, Thomas Fang Zheng

    Abstract: Automatic Speaker Verification (ASV) suffers from performance degradation in noisy conditions. To address this issue, we propose a novel adversarial learning framework that incorporates noise-disentanglement to establish a noise-independent speaker invariant embedding space. Specifically, the disentanglement module includes two encoders for separating speaker related and irrelevant information, re… ▽ More

    Submitted 22 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

    Comments: 5 pages, accepted by Interspeech2024

    Report number: 707-711

    Journal ref: Interspeech2024

  50. arXiv:2408.04593  [pdf, other

    cs.CV cs.RO eess.IV

    SAM 2 in Robotic Surgery: An Empirical Evaluation for Robustness and Generalization in Surgical Video Segmentation

    Authors: Jieming Yu, An Wang, Wenzhen Dong, Mengya Xu, Mobarakol Islam, Jie Wang, Long Bai, Hongliang Ren

    Abstract: The recent Segment Anything Model (SAM) 2 has demonstrated remarkable foundational competence in semantic segmentation, with its memory mechanism and mask decoder further addressing challenges in video tracking and object occlusion, thereby achieving superior results in interactive segmentation for both images and videos. Building upon our previous empirical studies, we further explore the zero-sh… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Empirical study. Previous work "SAM Meets Robotic Surgery" is accessible at: arXiv:2308.07156