Search | arXiv e-print repository

OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model

Authors: Chen Wang, Tianyu Peng, Wen Yang, Yinan Bai, Guangfu Wang, Jun Lin, Lanpeng Jia, Lingxiang Wu, Jinqiao Wang, Chengqing Zong, Jiajun Zhang

Abstract: Empathetic interaction is a cornerstone of human-machine communication, due to the need for understanding speech enriched with paralinguistic cues and generating emotional and expressive responses. However, the most powerful empathetic LSLMs are increasingly closed off, leaving the crucial details about the architecture, data and development opaque to researchers. Given the critical need for trans… ▽ More Empathetic interaction is a cornerstone of human-machine communication, due to the need for understanding speech enriched with paralinguistic cues and generating emotional and expressive responses. However, the most powerful empathetic LSLMs are increasingly closed off, leaving the crucial details about the architecture, data and development opaque to researchers. Given the critical need for transparent research into the LSLMs and empathetic behavior, we present OpenS2S, a fully open-source, transparent and end-to-end LSLM designed to enable empathetic speech interactions. Based on our empathetic speech-to-text model BLSP-Emo, OpenS2S further employs a streaming interleaved decoding architecture to achieve low-latency speech generation. To facilitate end-to-end training, OpenS2S incorporates an automated data construction pipeline that synthesizes diverse, high-quality empathetic speech dialogues at low cost. By leveraging large language models to generate empathetic content and controllable text-to-speech systems to introduce speaker and emotional variation, we construct a scalable training corpus with rich paralinguistic diversity and minimal human supervision. We release the fully open-source OpenS2S model, including the dataset, model weights, pre-training and fine-tuning codes, to empower the broader research community and accelerate innovation in empathetic speech systems. The project webpage can be accessed at https://casia-lm.github.io/OpenS2S △ Less

Submitted 8 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

Comments: Technical Report

arXiv:2507.04515 [pdf, ps, other]

A Quadratic Programming Algorithm with $O(n^3)$ Time Complexity

Authors: Liang Wu, Richard D. Braatz

Abstract: Solving linear systems and quadratic programming (QP) problems are both ubiquitous tasks in the engineering and computing fields. Direct methods for solving systems, such as Cholesky, LU, and QR factorizations, exhibit data-independent time complexity of $O(n^3)$. This raises a natural question: could there exist algorithms for solving QPs that also achieve \textit{data-independent} time complexit… ▽ More Solving linear systems and quadratic programming (QP) problems are both ubiquitous tasks in the engineering and computing fields. Direct methods for solving systems, such as Cholesky, LU, and QR factorizations, exhibit data-independent time complexity of $O(n^3)$. This raises a natural question: could there exist algorithms for solving QPs that also achieve \textit{data-independent} time complexity of $O(n^3)$? This raises a natural question: could there exist algorithms for solving QPs that also achieve data-independent time complexity of $O(n^3)$? This is critical for offering an execution time certificate for real-time optimization-based applications such as model predictive control. This article first demonstrates that solving real-time strictly convex QPs, Lasso problems, and support vector machine problems can be turned into solving box-constrained QPs (Box-QPs), which support a cost-free initialization strategy for feasible interior-point methods (IPMs). Next, focusing on solving Box-QPs, this article replaces the exact Newton step with an approximated Newton step (substituting the matrix-inversion operation with multiple rank-1 updates) within feasible IPMs. For the first time, this article proposes an implementable feasible IPM algorithm with $O(n^3)$ time complexity, by proving the number of iterations is exact $O(\sqrt{n})$ and the number of rank-1 updates is bounded by $O(n)$. Numerical validations/applications and codes are provided. △ Less

Submitted 6 July, 2025; originally announced July 2025.

Comments: 16 pages

arXiv:2507.02289 [pdf, ps, other]

CineMyoPS: Segmenting Myocardial Pathologies from Cine Cardiac MR

Authors: Wangbin Ding, Lei Li, Junyi Qiu, Bogen Lin, Mingjing Yang, Liqin Huang, Lianming Wu, Sihan Wang, Xiahai Zhuang

Abstract: Myocardial infarction (MI) is a leading cause of death worldwide. Late gadolinium enhancement (LGE) and T2-weighted cardiac magnetic resonance (CMR) imaging can respectively identify scarring and edema areas, both of which are essential for MI risk stratification and prognosis assessment. Although combining complementary information from multi-sequence CMR is useful, acquiring these sequences can… ▽ More Myocardial infarction (MI) is a leading cause of death worldwide. Late gadolinium enhancement (LGE) and T2-weighted cardiac magnetic resonance (CMR) imaging can respectively identify scarring and edema areas, both of which are essential for MI risk stratification and prognosis assessment. Although combining complementary information from multi-sequence CMR is useful, acquiring these sequences can be time-consuming and prohibitive, e.g., due to the administration of contrast agents. Cine CMR is a rapid and contrast-free imaging technique that can visualize both motion and structural abnormalities of the myocardium induced by acute MI. Therefore, we present a new end-to-end deep neural network, referred to as CineMyoPS, to segment myocardial pathologies, \ie scars and edema, solely from cine CMR images. Specifically, CineMyoPS extracts both motion and anatomy features associated with MI. Given the interdependence between these features, we design a consistency loss (resembling the co-training strategy) to facilitate their joint learning. Furthermore, we propose a time-series aggregation strategy to integrate MI-related features across the cardiac cycle, thereby enhancing segmentation accuracy for myocardial pathologies. Experimental results on a multi-center dataset demonstrate that CineMyoPS achieves promising performance in myocardial pathology segmentation, motion estimation, and anatomy segmentation. △ Less

Submitted 2 July, 2025; originally announced July 2025.

arXiv:2506.18386 [pdf, ps, other]

Aperiodic-sampled neural network controllers with closed-loop stability verifications (extended version)

Authors: Renjie Ma, Zhijian Hu, Rongni Yang, Ligang Wu

Abstract: In this paper, we synthesize two aperiodic-sampled deep neural network (DNN) control schemes, based on the closed-loop tracking stability guarantees. By means of the integral quadratic constraint coping with the input-output behaviour of system uncertainties/nonlinearities and the convex relaxations of nonlinear DNN activations leveraging their local sector-bounded attributes, we establish conditi… ▽ More In this paper, we synthesize two aperiodic-sampled deep neural network (DNN) control schemes, based on the closed-loop tracking stability guarantees. By means of the integral quadratic constraint coping with the input-output behaviour of system uncertainties/nonlinearities and the convex relaxations of nonlinear DNN activations leveraging their local sector-bounded attributes, we establish conditions to design the event- and self-triggered logics and to compute the ellipsoidal inner approximations of region of attraction, respectively. Finally, we perform a numerical example of an inverted pendulum to illustrate the effectiveness of the proposed aperiodic-sampled DNN control schemes. △ Less

Submitted 23 June, 2025; originally announced June 2025.

Comments: 17 pages, 10 figures

arXiv:2505.17912 [pdf, ps, other]

UltraBoneUDF: Self-supervised Bone Surface Reconstruction from Ultrasound Based on Neural Unsigned Distance Functions

Authors: Luohong Wu, Matthias Seibold, Nicola A. Cavalcanti, Giuseppe Loggia, Lisa Reissner, Bastian Sigrist, Jonas Hein, Lilian Calvet, Arnd Viehöfer, Philipp Fürnstahl

Abstract: Background: Bone surface reconstruction plays a critical role in computer-assisted orthopedic surgery. Compared to traditional imaging modalities such as CT and MRI, ultrasound offers a radiation-free, cost-effective, and portable alternative. Continuous bone surface reconstruction can be employed for many clinical applications. However, due to the inherent limitations of ultrasound imaging, B-mod… ▽ More Background: Bone surface reconstruction plays a critical role in computer-assisted orthopedic surgery. Compared to traditional imaging modalities such as CT and MRI, ultrasound offers a radiation-free, cost-effective, and portable alternative. Continuous bone surface reconstruction can be employed for many clinical applications. However, due to the inherent limitations of ultrasound imaging, B-mode ultrasound typically capture only partial bone surfaces. Existing reconstruction methods struggle with such incomplete data, leading to artifacts and increased reconstruction errors. Effective techniques for accurately reconstructing thin and open bone surfaces from real-world 3D ultrasound volumes remain lacking. Methods: We propose UltraBoneUDF, a self-supervised framework designed for reconstructing open bone surfaces from ultrasound using neural Unsigned Distance Functions. To enhance reconstruction quality, we introduce a novel global feature extractor that effectively fuses ultrasound-specific image characteristics. Additionally, we present a novel loss function based on local tangent plane optimization that substantially improves surface reconstruction quality. UltraBoneUDF and baseline models are extensively evaluated on four open-source datasets. Results: Qualitative results highlight the limitations of the state-of-the-art methods for open bone surface reconstruction and demonstrate the effectiveness of UltraBoneUDF. Quantitatively, UltraBoneUDF significantly outperforms competing methods across all evaluated datasets for both open and closed bone surface reconstruction in terms of mean Chamfer distance error: 1.10 mm on the UltraBones100k dataset (39.6\% improvement compared to the SOTA), 0.23 mm on the OpenBoneCT dataset (69.3\% improvement), 0.18 mm on the ClosedBoneCT dataset (70.2\% improvement), and 0.05 mm on the Prostate dataset (55.3\% improvement). △ Less

Submitted 7 July, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

arXiv:2505.15861 [pdf, ps, other]

P3Net: Progressive and Periodic Perturbation for Semi-Supervised Medical Image Segmentation

Authors: Zhenyan Yao, Miao Zhang, Lanhu Wu, Yongri Piao, Feng Tian, Weibing Sun, Huchuan Lu

Abstract: Perturbation with diverse unlabeled data has proven beneficial for semi-supervised medical image segmentation (SSMIS). While many works have successfully used various perturbation techniques, a deeper understanding of learning perturbations is needed. Excessive or inappropriate perturbation can have negative effects, so we aim to address two challenges: how to use perturbation mechanisms to guide… ▽ More Perturbation with diverse unlabeled data has proven beneficial for semi-supervised medical image segmentation (SSMIS). While many works have successfully used various perturbation techniques, a deeper understanding of learning perturbations is needed. Excessive or inappropriate perturbation can have negative effects, so we aim to address two challenges: how to use perturbation mechanisms to guide the learning of unlabeled data through labeled data, and how to ensure accurate predictions in boundary regions. Inspired by human progressive and periodic learning, we propose a progressive and periodic perturbation mechanism (P3M) and a boundary-focused loss. P3M enables dynamic adjustment of perturbations, allowing the model to gradually learn them. Our boundary-focused loss encourages the model to concentrate on boundary regions, enhancing sensitivity to intricate details and ensuring accurate predictions. Experimental results demonstrate that our method achieves state-of-the-art performance on two 2D and 3D datasets. Moreover, P3M is extendable to other methods, and the proposed loss serves as a universal tool for improving existing methods, highlighting the scalability and applicability of our approach. △ Less

Submitted 21 May, 2025; originally announced May 2025.

arXiv:2505.15135 [pdf, other]

doi 10.1007/978-3-031-74561-4_6

Physics-Guided Multi-View Graph Neural Network for Schizophrenia Classification via Structural-Functional Coupling

Authors: Badhan Mazumder, Ayush Kanyal, Lei Wu, Vince D. Calhoun, Dong Hye Ye

Abstract: Clinical studies reveal disruptions in brain structural connectivity (SC) and functional connectivity (FC) in neuropsychiatric disorders such as schizophrenia (SZ). Traditional approaches might rely solely on SC due to limited functional data availability, hindering comprehension of cognitive and behavioral impairments in individuals with SZ by neglecting the intricate SC-FC interrelationship. To… ▽ More Clinical studies reveal disruptions in brain structural connectivity (SC) and functional connectivity (FC) in neuropsychiatric disorders such as schizophrenia (SZ). Traditional approaches might rely solely on SC due to limited functional data availability, hindering comprehension of cognitive and behavioral impairments in individuals with SZ by neglecting the intricate SC-FC interrelationship. To tackle the challenge, we propose a novel physics-guided deep learning framework that leverages a neural oscillation model to describe the dynamics of a collection of interconnected neural oscillators, which operate via nerve fibers dispersed across the brain's structure. Our proposed framework utilizes SC to simultaneously generate FC by learning SC-FC coupling from a system dynamics perspective. Additionally, it employs a novel multi-view graph neural network (GNN) with a joint loss to perform correlation-based SC-FC fusion and classification of individuals with SZ. Experiments conducted on a clinical dataset exhibited improved performance, demonstrating the robustness of our proposed approach. △ Less

Submitted 21 May, 2025; originally announced May 2025.

Comments: Accepted and presented at the 7th International Workshop on PRedictive Intelligence in MEdicine (Held in Conjunction with MICCAI 2024)

arXiv:2504.21594 [pdf]

Switching Transients in Constrained Transformer-Line/Cable Configurations

Authors: Y. Xiang, L. Wu, K. Velitsikakis, A. L. J. Janssen

Abstract: This paper investigates the transient phenomena that occur in two special cases in the Netherlands: (A) during the energization of a power transformer via a cable feeder and (B) the energization of a power transformer together with an overhead line (OHL). In Case A a 7 km long 150 kV cable and a 150/50 kV transformer are connected and energized at the same time. In Case B a 150/50 kV transformer a… ▽ More This paper investigates the transient phenomena that occur in two special cases in the Netherlands: (A) during the energization of a power transformer via a cable feeder and (B) the energization of a power transformer together with an overhead line (OHL). In Case A a 7 km long 150 kV cable and a 150/50 kV transformer are connected and energized at the same time. In Case B a 150/50 kV transformer and a short 50 kV OHL are connected and energized simultaneously. The reason behind this kind of situations is related to space restrictions and cost efficiency. △ Less

Submitted 30 April, 2025; originally announced April 2025.

Comments: 11 pages, 17 figures, CIGRE conference 2016

arXiv:2504.12527 [pdf]

Analysis of the MICCAI Brain Tumor Segmentation -- Metastases (BraTS-METS) 2025 Lighthouse Challenge: Brain Metastasis Segmentation on Pre- and Post-treatment MRI

Authors: Nazanin Maleki, Raisa Amiruddin, Ahmed W. Moawad, Nikolay Yordanov, Athanasios Gkampenis, Pascal Fehringer, Fabian Umeh, Crystal Chukwurah, Fatima Memon, Bojan Petrovic, Justin Cramer, Mark Krycia, Elizabeth B. Shrickel, Ichiro Ikuta, Gerard Thompson, Lorenna Vidal, Vilma Kosovic, Adam E. Goldman-Yassen, Virginia Hill, Tiffany So, Sedra Mhana, Albara Alotaibi, Nathan Page, Prisha Bhatia, Melisa S. Guelen , et al. (219 additional authors not shown)

Abstract: Despite continuous advancements in cancer treatment, brain metastatic disease remains a significant complication of primary cancer and is associated with an unfavorable prognosis. One approach for improving diagnosis, management, and outcomes is to implement algorithms based on artificial intelligence for the automated segmentation of both pre- and post-treatment MRI brain images. Such algorithms… ▽ More Despite continuous advancements in cancer treatment, brain metastatic disease remains a significant complication of primary cancer and is associated with an unfavorable prognosis. One approach for improving diagnosis, management, and outcomes is to implement algorithms based on artificial intelligence for the automated segmentation of both pre- and post-treatment MRI brain images. Such algorithms rely on volumetric criteria for lesion identification and treatment response assessment, which are still not available in clinical practice. Therefore, it is critical to establish tools for rapid volumetric segmentations methods that can be translated to clinical practice and that are trained on high quality annotated data. The BraTS-METS 2025 Lighthouse Challenge aims to address this critical need by establishing inter-rater and intra-rater variability in dataset annotation by generating high quality annotated datasets from four individual instances of segmentation by neuroradiologists while being recorded on video (two instances doing "from scratch" and two instances after AI pre-segmentation). This high-quality annotated dataset will be used for testing phase in 2025 Lighthouse challenge and will be publicly released at the completion of the challenge. The 2025 Lighthouse challenge will also release the 2023 and 2024 segmented datasets that were annotated using an established pipeline of pre-segmentation, student annotation, two neuroradiologists checking, and one neuroradiologist finalizing the process. It builds upon its previous edition by including post-treatment cases in the dataset. Using these high-quality annotated datasets, the 2025 Lighthouse challenge plans to test benchmark algorithms for automated segmentation of pre-and post-treatment brain metastases (BM), trained on diverse and multi-institutional datasets of MRI images obtained from patients with brain metastases. △ Less

Submitted 10 July, 2025; v1 submitted 16 April, 2025; originally announced April 2025.

Comments: 28 pages, 4 figures, 2 tables

arXiv:2503.13400 [pdf, other]

U2AD: Uncertainty-based Unsupervised Anomaly Detection Framework for Detecting T2 Hyperintensity in MRI Spinal Cord

Authors: Qi Zhang, Xiuyuan Chen, Ziyi He, Kun Wang, Lianming Wu, Hongxing Shen, Jianqi Sun

Abstract: T2 hyperintensities in spinal cord MR images are crucial biomarkers for conditions such as degenerative cervical myelopathy. However, current clinical diagnoses primarily rely on manual evaluation. Deep learning methods have shown promise in lesion detection, but most supervised approaches are heavily dependent on large, annotated datasets. Unsupervised anomaly detection (UAD) offers a compelling… ▽ More T2 hyperintensities in spinal cord MR images are crucial biomarkers for conditions such as degenerative cervical myelopathy. However, current clinical diagnoses primarily rely on manual evaluation. Deep learning methods have shown promise in lesion detection, but most supervised approaches are heavily dependent on large, annotated datasets. Unsupervised anomaly detection (UAD) offers a compelling alternative by eliminating the need for abnormal data annotations. However, existing UAD methods rely on curated normal datasets and their performance frequently deteriorates when applied to clinical datasets due to domain shifts. We propose an Uncertainty-based Unsupervised Anomaly Detection framework, termed U2AD, to address these limitations. Unlike traditional methods, U2AD is designed to be trained and tested within the same clinical dataset, following a "mask-and-reconstruction" paradigm built on a Vision Transformer-based architecture. We introduce an uncertainty-guided masking strategy to resolve task conflicts between normal reconstruction and anomaly detection to achieve an optimal balance. Specifically, we employ a Monte-Carlo sampling technique to estimate reconstruction uncertainty mappings during training. By iteratively optimizing reconstruction training under the guidance of both epistemic and aleatoric uncertainty, U2AD reduces overall reconstruction variance while emphasizing regions. Experimental results demonstrate that U2AD outperforms existing supervised and unsupervised methods in patient-level identification and segment-level localization tasks. This framework establishes a new benchmark for incorporating uncertainty guidance into UAD, highlighting its clinical utility in addressing domain shifts and task conflicts in medical image anomaly detection. Our code is available: https://github.com/zhibaishouheilab/U2AD △ Less

Submitted 17 March, 2025; originally announced March 2025.

arXiv:2503.06114 [pdf, other]

Pathology-Guided AI System for Accurate Segmentation and Diagnosis of Cervical Spondylosis

Authors: Qi Zhang, Xiuyuan Chen, Ziyi He, Lianming Wu, Kun Wang, Jianqi Sun, Hongxing Shen

Abstract: Cervical spondylosis, a complex and prevalent condition, demands precise and efficient diagnostic techniques for accurate assessment. While MRI offers detailed visualization of cervical spine anatomy, manual interpretation remains labor-intensive and prone to error. To address this, we developed an innovative AI-assisted Expert-based Diagnosis System that automates both segmentation and diagnosis… ▽ More Cervical spondylosis, a complex and prevalent condition, demands precise and efficient diagnostic techniques for accurate assessment. While MRI offers detailed visualization of cervical spine anatomy, manual interpretation remains labor-intensive and prone to error. To address this, we developed an innovative AI-assisted Expert-based Diagnosis System that automates both segmentation and diagnosis of cervical spondylosis using MRI. Leveraging a dataset of 960 cervical MRI images from patients with cervical disc herniation, our system features a pathology-guided segmentation model capable of accurately segmenting key cervical anatomical structures. The segmentation is followed by an expert-based diagnostic framework that automates the calculation of critical clinical indicators. Our segmentation model achieved an impressive average Dice coefficient exceeding 0.90 across four cervical spinal anatomies and demonstrated enhanced accuracy in herniation areas. Diagnostic evaluation further showcased the system precision, with a mean absolute error (MAE) of 2.44 degree for the C2-C7 Cobb angle and 3.60 precentage for the Maximum Spinal Cord Compression (MSCC) coefficient. In addition, our method delivered high accuracy, precision, recall, and F1 scores in herniation localization, K-line status assessment, and T2 hyperintensity detection. Comparative analysis demonstrates that our system outperforms existing methods, establishing a new benchmark for segmentation and diagnostic tasks for cervical spondylosis. △ Less

Submitted 8 March, 2025; originally announced March 2025.

arXiv:2503.03971 [pdf, other]

Towards Universal Learning-based Model for Cardiac Image Reconstruction: Summary of the CMRxRecon2024 Challenge

Authors: Fanwen Wang, Zi Wang, Yan Li, Jun Lyu, Chen Qin, Shuo Wang, Kunyuan Guo, Mengting Sun, Mingkai Huang, Haoyu Zhang, Michael Tänzer, Qirong Li, Xinran Chen, Jiahao Huang, Yinzhe Wu, Kian Anvari Hamedani, Yuntong Lyu, Longyu Sun, Qing Li, Ziqiang Xu, Bingyu Xin, Dimitris N. Metaxas, Narges Razizadeh, Shahabedin Nabavi, George Yiasemis , et al. (34 additional authors not shown)

Abstract: Cardiovascular magnetic resonance (CMR) imaging offers diverse contrasts for non-invasive assessment of cardiac function and myocardial characterization. However, CMR often requires the acquisition of many contrasts, and each contrast takes a considerable amount of time. The extended acquisition time will further increase the susceptibility to motion artifacts. Existing deep learning-based reconst… ▽ More Cardiovascular magnetic resonance (CMR) imaging offers diverse contrasts for non-invasive assessment of cardiac function and myocardial characterization. However, CMR often requires the acquisition of many contrasts, and each contrast takes a considerable amount of time. The extended acquisition time will further increase the susceptibility to motion artifacts. Existing deep learning-based reconstruction methods have been proven to perform well in image reconstruction tasks, but most of them are designed for specific acquisition modality or dedicated imaging parameter, which limits their ability to generalize across a variety of scan scenarios. To address this issue, the CMRxRecon2024 challenge consists of two specific tasks: Task 1 focuses on a modality-universal setting, evaluating the out-of-distribution generalization of existing learning-based models, while Task 2 follows a k-space sampling-universal setting, assessing the all-in-one adaptability of universal models. Main contributions of this challenge include providing the largest publicly available multi-modality, multi-view cardiac k-space dataset; and developing an open benchmarking platform for algorithm evaluation and shared code library for data processing. In addition, through a detailed analysis of the results submitted to the challenge, we have also made several findings, including: 1) adaptive prompt-learning embedding is an effective means for achieving strong generalization in reconstruction models; 2) enhanced data consistency based on physics-informed networks is also an effective pathway toward a universal model; 3) traditional evaluation metrics have limitations when assessing ground-truth references with moderate or lower image quality, highlighting the need for subjective evaluation methods. This challenge attracted 200 participants from 18 countries, aimed at promoting their translation into clinical practice. △ Less

Submitted 13 March, 2025; v1 submitted 5 March, 2025; originally announced March 2025.

arXiv:2502.18519 [pdf, other]

FreeTumor: Large-Scale Generative Tumor Synthesis in Computed Tomography Images for Improving Tumor Recognition

Authors: Linshan Wu, Jiaxin Zhuang, Yanning Zhou, Sunan He, Jiabo Ma, Luyang Luo, Xi Wang, Xuefeng Ni, Xiaoling Zhong, Mingxiang Wu, Yinghua Zhao, Xiaohui Duan, Varut Vardhanabhuti, Pranav Rajpurkar, Hao Chen

Abstract: Tumor is a leading cause of death worldwide, with an estimated 10 million deaths attributed to tumor-related diseases every year. AI-driven tumor recognition unlocks new possibilities for more precise and intelligent tumor screening and diagnosis. However, the progress is heavily hampered by the scarcity of annotated datasets, which demands extensive annotation efforts by radiologists. To tackle t… ▽ More Tumor is a leading cause of death worldwide, with an estimated 10 million deaths attributed to tumor-related diseases every year. AI-driven tumor recognition unlocks new possibilities for more precise and intelligent tumor screening and diagnosis. However, the progress is heavily hampered by the scarcity of annotated datasets, which demands extensive annotation efforts by radiologists. To tackle this challenge, we introduce FreeTumor, an innovative Generative AI (GAI) framework to enable large-scale tumor synthesis for mitigating data scarcity. Specifically, FreeTumor effectively leverages a combination of limited labeled data and large-scale unlabeled data for tumor synthesis training. Unleashing the power of large-scale data, FreeTumor is capable of synthesizing a large number of realistic tumors on images for augmenting training datasets. To this end, we create the largest training dataset for tumor synthesis and recognition by curating 161,310 publicly available Computed Tomography (CT) volumes from 33 sources, with only 2.3% containing annotated tumors. To validate the fidelity of synthetic tumors, we engaged 13 board-certified radiologists in a Visual Turing Test to discern between synthetic and real tumors. Rigorous clinician evaluation validates the high quality of our synthetic tumors, as they achieved only 51.1% sensitivity and 60.8% accuracy in distinguishing our synthetic tumors from real ones. Through high-quality tumor synthesis, FreeTumor scales up the recognition training datasets by over 40 times, showcasing a notable superiority over state-of-the-art AI methods including various synthesis methods and foundation models. These findings indicate promising prospects of FreeTumor in clinical applications, potentially advancing tumor treatments and improving the survival rates of patients. △ Less

Submitted 23 February, 2025; originally announced February 2025.

arXiv:2502.07738 [pdf, other]

EIQP: Execution-time-certified and Infeasibility-detecting QP Solver

Authors: Liang Wu, Wei Xiao, Richard D. Braatz

Abstract: Solving real-time quadratic programming (QP) is a ubiquitous task in control engineering, such as in model predictive control and control barrier function-based QP. In such real-time scenarios, certifying that the employed QP algorithm can either return a solution within a predefined level of optimality or detect QP infeasibility before the predefined sampling time is a pressing requirement. This… ▽ More Solving real-time quadratic programming (QP) is a ubiquitous task in control engineering, such as in model predictive control and control barrier function-based QP. In such real-time scenarios, certifying that the employed QP algorithm can either return a solution within a predefined level of optimality or detect QP infeasibility before the predefined sampling time is a pressing requirement. This article considers convex QP (including linear programming) and adopts its homogeneous formulation to achieve infeasibility detection. Exploiting this homogeneous formulation, this article proposes a novel infeasible interior-point method (IPM) algorithm with the best theoretical $O(\sqrt{n})$ iteration complexity that feasible IPM algorithms enjoy. The iteration complexity is proved to be \textit{exact} (rather than an upper bound), \textit{simple to calculate}, and \textit{data independent}, with the value $\left\lceil\frac{\log(\frac{n+1}ε)}{-\log(1-\frac{0.414213}{\sqrt{n+1}})}\right\rceil$ (where $n$ and $ε$ denote the number of constraints and the predefined optimality level, respectively), making it appealing to certify the execution time of online time-varying convex QPs. The proposed algorithm is simple to implement without requiring a line search procedure (uses the full Newton step), and its C-code implementation (offering MATLAB, Julia, and Python interfaces) and numerical examples are publicly available at https://github.com/liangwu2019/EIQP. △ Less

Submitted 14 February, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

Comments: 14 pages, 3 figures

arXiv:2502.03783 [pdf]

doi 10.1016/j.compbiomed.2025.110435.

UltraBones100k: A reliable automated labeling method and large-scale dataset for ultrasound-based bone surface extraction

Authors: Luohong Wu, Nicola A. Cavalcanti, Matthias Seibold, Giuseppe Loggia, Lisa Reissner, Jonas Hein, Silvan Beeler, Arnd Viehöfer, Stephan Wirth, Lilian Calvet, Philipp Fürnstahl

Abstract: Ultrasound-based bone surface segmentation is crucial in computer-assisted orthopedic surgery. However, ultrasound images have limitations, including a low signal-to-noise ratio, and acoustic shadowing, which make interpretation difficult. Existing deep learning models for bone segmentation rely primarily on costly manual labeling by experts, limiting dataset size and model generalizability. Addit… ▽ More Ultrasound-based bone surface segmentation is crucial in computer-assisted orthopedic surgery. However, ultrasound images have limitations, including a low signal-to-noise ratio, and acoustic shadowing, which make interpretation difficult. Existing deep learning models for bone segmentation rely primarily on costly manual labeling by experts, limiting dataset size and model generalizability. Additionally, the complexity of ultrasound physics and acoustic shadow makes the images difficult for humans to interpret, leading to incomplete labels in anechoic regions and limiting model performance. To advance ultrasound bone segmentation and establish effective model benchmarks, larger and higher-quality datasets are needed. We propose a methodology for collecting ex-vivo ultrasound datasets with automatically generated bone labels, including anechoic regions. The proposed labels are derived by accurately superimposing tracked bone CT models onto the tracked ultrasound images. These initial labels are refined to account for ultrasound physics. A clinical evaluation is conducted by an expert physician specialized on orthopedic sonography to assess the quality of the generated bone labels. A neural network for bone segmentation is trained on the collected dataset and its predictions are compared to expert manual labels, evaluating accuracy, completeness, and F1-score. We collected the largest known dataset of 100k ultrasound images of human lower limbs with bone labels, called UltraBones100k. A Wilcoxon signed-rank test with Bonferroni correction confirmed that the bone alignment after our method significantly improved the quality of bone labeling (p < 0.001). The model trained on UltraBones100k consistently outperforms manual labeling in all metrics, particularly in low-intensity regions (320% improvement in completeness at a distance threshold of 0.5 mm). △ Less

Submitted 4 June, 2025; v1 submitted 6 February, 2025; originally announced February 2025.

Comments: accepted by Computers in Biology and Medicine

Journal ref: Computers in Biology and Medicine 194 (2025) 110435

arXiv:2501.14792 [pdf, other]

A Wearable Strain-Sensor-Based Shoulder Patch for Fatigue Detection in Bicep Curls

Authors: Ming Xuan Chua, Shuhua Peng, Thanh Nho Do, Chun Hui Wang, Liao Wu

Abstract: A common challenge in home-based rehabilitation is muscle compensation induced by pain or fatigue, where patients with weakened primary muscles recruit secondary muscle groups to assist their movement, causing issues such as delayed rehabilitation progress or risk of further injury. In a home-based setting, the subtle compensatory actions may not be perceived since physiotherapists cannot directly… ▽ More A common challenge in home-based rehabilitation is muscle compensation induced by pain or fatigue, where patients with weakened primary muscles recruit secondary muscle groups to assist their movement, causing issues such as delayed rehabilitation progress or risk of further injury. In a home-based setting, the subtle compensatory actions may not be perceived since physiotherapists cannot directly observe patients. To address this problem, this study develops a novel wearable strain-sensor-based shoulder patch to detect fatigue-induced muscle compensation during bicep curl exercises. Built on an observation that the amplitude of a strain sensor's resistance is correlated to the motion of a joint that the sensor is attached to, we develop an algorithm that can robustly detect the state when significant changes appear in the shoulder joint motion, which indicates fatigue-induced muscle compensation in bicep curls. The developed shoulder patch is tested on 13 subjects who perform bicep curl exercises with a 5 kg dumbell until reaching fatigue. During the experiment, the performance of the shoulder patch is also benchmarked with optical tracking sensors and surface electromyography (sEMG) sensors. Results reveal that the proposed wearable sensor and detection methods effectively monitor fatigue-induced muscle compensation during bicep curl exercises in both Real-Time and Post Hoc modes. This development marks a significant step toward enhancing the effectiveness of home-based rehabilitation by providing physiotherapists with a tool to monitor and adjust treatment plans remotely. △ Less

Submitted 10 January, 2025; originally announced January 2025.

Comments: 12 pages, 13 figures, submitted to T-IM

arXiv:2501.09759 [pdf]

A wideband amplifying and filtering reconfigurable intelligent surface for wireless relay

Authors: Lijie Wu, Qun Yan Zhou, Jun Yan Dai, Siran Wang, Junwei Zhang, Zhen Jie Qi, Hanqing Yang, Ruizhe Jiang, Zheng Xing Wang, Huidong Li, Zhen Zhang, Jiang Luo, Qiang Cheng, Tie Jun Cui

Abstract: Programmable metasurfaces have garnered significant attention due to their exceptional ability to manipulate electromagnetic (EM) waves in real time, leading to the emergence of a prominent area in wireless communication, namely reconfigurable intelligent surfaces (RISs), to control the signal propagation and coverage. However, the existing RISs usually suffer from limited operating distance and b… ▽ More Programmable metasurfaces have garnered significant attention due to their exceptional ability to manipulate electromagnetic (EM) waves in real time, leading to the emergence of a prominent area in wireless communication, namely reconfigurable intelligent surfaces (RISs), to control the signal propagation and coverage. However, the existing RISs usually suffer from limited operating distance and band interference, which hinder their practical applications in wireless relay and communication systems. To overcome the limitations, we propose an amplifying and filtering RIS (AF-RIS) to enhance the in-band signal energy and filter the out-of-band signal of the incident EM waves, ensuring the miniaturization of the RIS array and enabling its anti-interference ability. In addition, each AF-RIS element is equipped with a 2-bit phase control capability, further endowing the entire array with great beamforming performance. An elaborately designed 4*8 AF-RIS array is presented by integrating the power dividing and combining networks, which substantially reduces the number of amplifiers and filters, thereby reducing the hardware costs and power consumption. Experimental results showcase the powerful capabilities of AF-RIS in beam-steering, frequency selectivity, and signal amplification. Therefore, the proposed AF-RIS holds significant promise for critical applications in wireless relay systems by offering an efficient solution to improve frequency selectivity, enhance signal coverage, and reduce hardware size. △ Less

Submitted 31 December, 2024; originally announced January 2025.

arXiv:2501.04839 [pdf, other]

DRL-Based Medium-Term Planning of Renewable-Integrated Self-Scheduling Cascaded Hydropower to Guide Wholesale Market Participation

Authors: Xianbang Chen, Yikui Liu, Neng Fan, Lei Wu

Abstract: For self-scheduling cascaded hydropower (S-CHP) facilities, medium-term planning is a critical step that coordinates water availability over the medium-term horizon, providing water usage guidance for their short-term operations in wholesale market participation. Typically, medium-term planning strategies (e.g., reservoir storage targets at the end of each short-term period) are determined by eith… ▽ More For self-scheduling cascaded hydropower (S-CHP) facilities, medium-term planning is a critical step that coordinates water availability over the medium-term horizon, providing water usage guidance for their short-term operations in wholesale market participation. Typically, medium-term planning strategies (e.g., reservoir storage targets at the end of each short-term period) are determined by either optimization methods or rules of thumb. However, with the integration of variable renewable energy sources (VRESs), optimization-based methods suffer from deviations between the anticipated and actual reservoir storage, while rules of thumb could be financially conservative, thereby compromising short-term operating profitability in wholesale market participation. This paper presents a deep reinforcement learning (DRL)-based framework to derive medium-term planning policies for VRES-integrated S-CHPs (VS-CHPs), which can leverage contextual information underneath individual short-term periods and train planning policies by their induced short-term operating profits in wholesale market participation. The proposed DRL-based framework offers two practical merits. First, its planning strategies consider both seasonal requirements of reservoir storage and needs for short-term operating profits. Second, it adopts a multi-parametric programming-based strategy to accelerate the expensive training process associated with multi-step short-term operations. Finally, the DRL-based framework is evaluated on a real-world VS-CHP, demonstrating its advantages over current practice. △ Less

Submitted 8 January, 2025; originally announced January 2025.

arXiv:2501.02718 [pdf]

Multi-Transmission Node DER Aggregation: Chance-Constrained Unit Commitment with Bounded Hetero-Dimensional Mixture Model for Uncertain Distribution Factors

Authors: Weilun Wang, Zhentong Shao, Yikui Liu, Brent Eldridge, Abhishek Somani, Jesse T. Holzer, Lei Wu

Abstract: To facilitate the integration of distributed energy resources (DERs) into the wholesale market while maintaining the tractability of associated market operation tools such as unit commitment (UC), existing DER aggregation (DERA) studies usually consider that each DERA is presented on a single node of the transmission network. Nevertheless, the increasing scale and geographical distribution of DERs… ▽ More To facilitate the integration of distributed energy resources (DERs) into the wholesale market while maintaining the tractability of associated market operation tools such as unit commitment (UC), existing DER aggregation (DERA) studies usually consider that each DERA is presented on a single node of the transmission network. Nevertheless, the increasing scale and geographical distribution of DERs spur the emergence of DERAs covering multiple transmission nodes, posing new challenges in modeling such multi-transmission-node DERAs (M-DERAs). Indeed, assessing the aggregated impact of an M-DERA on power flows is a non-trivial task, because the sensitivities of each transmission line to DERs at different transmission nodes are not identical. Inspired by the distribution factor (DF) based shift factor (SF) aggregation strategy in industry practice, this paper proposes a novel DF-based chance-constrained UC (CCUC) model to determine system optimal operation plans with M-DERAs. DFs, treated as uncertain parameters to describe possible responses of DERs against aggregated dispatch instructions from regional transmission organizations, are modeled via a bounded hetero-dimensional mixture model (BHMM) by leveraging historical DF records distributed on multiple hyperplanes in a bounded space. With this, power flow limits are modeled as chance constraints in CCUC, which is reformulated into a scenarios-based stochastic form and solved by Benders decomposition. The proposed method is tested on an IEEE 24-bus system to illustrate its effectiveness in managing M-DERA integration while ensuring operational economics and mitigating the overloading of transmission lines. △ Less

Submitted 5 January, 2025; originally announced January 2025.

Comments: 10 pages, 9 figures

arXiv:2501.02410 [pdf, ps, other]

doi 10.1109/TMECH.2025.3568801

JammingSnake: A follow-the-leader continuum robot with variable stiffness based on fiber jamming

Authors: Chen Qian, Tangyou Liu, Liao Wu

Abstract: Follow-the-leader (FTL) motion is essential for continuum robots operating in fragile and confined environments. It allows the robot to exert minimal force on its surroundings, reducing the risk of damage. This paper presents a novel design of a snake-like robot capable of achieving FTL motion by integrating fiber jamming modules (FJMs). The proposed robot can dynamically adjust its stiffness duri… ▽ More Follow-the-leader (FTL) motion is essential for continuum robots operating in fragile and confined environments. It allows the robot to exert minimal force on its surroundings, reducing the risk of damage. This paper presents a novel design of a snake-like robot capable of achieving FTL motion by integrating fiber jamming modules (FJMs). The proposed robot can dynamically adjust its stiffness during propagation and interaction with the environment. An algorithm is developed to independently control the tendon and FJM insertion movements, allowing the robot to maintain its shape while minimizing the forces exerted on surrounding structures. To validate the proposed design, comparative tests were conducted between a traditional tendon-driven robot and the novel design under different configurations. The results demonstrate that our design relies significantly less on contact with the surroundings to maintain its shape. This highlights its potential for safer and more effective operations in delicate environments, such as minimally invasive surgery (MIS) or industrial in-situ inspection. △ Less

Submitted 19 June, 2025; v1 submitted 4 January, 2025; originally announced January 2025.

Comments: 8 pages, 4 figures, published in T-MECH

arXiv:2412.08671 [pdf, other]

A Deep Semantic Segmentation Network with Semantic and Contextual Refinements

Authors: Zhiyan Wang, Deyin Liu, Lin Yuanbo Wu, Song Wang, Xin Guo, Lin Qi

Abstract: Semantic segmentation is a fundamental task in multimedia processing, which can be used for analyzing, understanding, editing contents of images and videos, among others. To accelerate the analysis of multimedia data, existing segmentation researches tend to extract semantic information by progressively reducing the spatial resolutions of feature maps. However, this approach introduces a misalignm… ▽ More Semantic segmentation is a fundamental task in multimedia processing, which can be used for analyzing, understanding, editing contents of images and videos, among others. To accelerate the analysis of multimedia data, existing segmentation researches tend to extract semantic information by progressively reducing the spatial resolutions of feature maps. However, this approach introduces a misalignment problem when restoring the resolution of high-level feature maps. In this paper, we design a Semantic Refinement Module (SRM) to address this issue within the segmentation network. Specifically, SRM is designed to learn a transformation offset for each pixel in the upsampled feature maps, guided by high-resolution feature maps and neighboring offsets. By applying these offsets to the upsampled feature maps, SRM enhances the semantic representation of the segmentation network, particularly for pixels around object boundaries. Furthermore, a Contextual Refinement Module (CRM) is presented to capture global context information across both spatial and channel dimensions. To balance dimensions between channel and space, we aggregate the semantic maps from all four stages of the backbone to enrich channel context information. The efficacy of these proposed modules is validated on three widely used datasets-Cityscapes, Bdd100K, and ADE20K-demonstrating superior performance compared to state-of-the-art methods. Additionally, this paper extends these modules to a lightweight segmentation network, achieving an mIoU of 82.5% on the Cityscapes validation set with only 137.9 GFLOPs. △ Less

Submitted 10 December, 2024; originally announced December 2024.

Comments: Accept by tmm

arXiv:2411.08538 [pdf]

Intelligent Adaptive Metasurface in Complex Wireless Environments

Authors: Han Qing Yang, Jun Yan Dai, Hui Dong Li, Lijie Wu, Meng Zhen Zhang, Zi Hang Shen, Si Ran Wang, Zheng Xing Wang, Wankai Tang, Shi Jin, Jun Wei Wu, Qiang Cheng, Tie Jun Cui

Abstract: The programmable metasurface is regarded as one of the most promising transformative technologies for next-generation wireless system applications. Due to the lack of effective perception ability of the external electromagnetic environment, there are numerous challenges in the intelligent regulation of wireless channels, and it still relies on external sensors to reshape electromagnetic environmen… ▽ More The programmable metasurface is regarded as one of the most promising transformative technologies for next-generation wireless system applications. Due to the lack of effective perception ability of the external electromagnetic environment, there are numerous challenges in the intelligent regulation of wireless channels, and it still relies on external sensors to reshape electromagnetic environment as desired. To address that problem, we propose an adaptive metasurface (AMS) which integrates the capabilities of acquiring wireless environment information and manipulating reflected electromagnetic (EM) waves in a programmable manner. The proposed design endows the metasurfaces with excellent capabilities to sense the complex electromagnetic field distributions around them and then dynamically manipulate the waves and signals in real time under the guidance of the sensed information, eliminating the need for prior knowledge or external inputs about the wireless environment. For verification, a prototype of the proposed AMS is constructed, and its dual capabilities of sensing and manipulation are experimentally validated. Additionally, different integrated sensing and communication (ISAC) scenarios with and without the aid of the AMS are established. The effectiveness of the AMS in enhancing communication quality is well demonstrated in complex electromagnetic environments, highlighting its beneficial application potential in future wireless systems. △ Less

Submitted 13 November, 2024; originally announced November 2024.

arXiv:2410.14769 [pdf, ps, other]

Medical Artificial Intelligence for Early Detection of Lung Cancer: A Survey

Authors: Guohui Cai, Ying Cai, Zeyu Zhang, Yuanzhouhan Cao, Lin Wu, Daji Ergu, Zhinbin Liao, Yang Zhao

Abstract: Lung cancer remains one of the leading causes of morbidity and mortality worldwide, making early diagnosis critical for improving therapeutic outcomes and patient prognosis. Computer-aided diagnosis systems, which analyze computed tomography images, have proven effective in detecting and classifying pulmonary nodules, significantly enhancing the detection rate of early-stage lung cancer. Although… ▽ More Lung cancer remains one of the leading causes of morbidity and mortality worldwide, making early diagnosis critical for improving therapeutic outcomes and patient prognosis. Computer-aided diagnosis systems, which analyze computed tomography images, have proven effective in detecting and classifying pulmonary nodules, significantly enhancing the detection rate of early-stage lung cancer. Although traditional machine learning algorithms have been valuable, they exhibit limitations in handling complex sample data. The recent emergence of deep learning has revolutionized medical image analysis, driving substantial advancements in this field. This review focuses on recent progress in deep learning for pulmonary nodule detection, segmentation, and classification. Traditional machine learning methods, such as support vector machines and k-nearest neighbors, have shown limitations, paving the way for advanced approaches like Convolutional Neural Networks, Recurrent Neural Networks, and Generative Adversarial Networks. The integration of ensemble models and novel techniques is also discussed, emphasizing the latest developments in lung cancer diagnosis. Deep learning algorithms, combined with various analytical techniques, have markedly improved the accuracy and efficiency of pulmonary nodule analysis, surpassing traditional methods, particularly in nodule classification. Although challenges remain, continuous technological advancements are expected to further strengthen the role of deep learning in medical diagnostics, especially for early lung cancer detection and diagnosis. A comprehensive list of lung cancer detection models reviewed in this work is available at https://github.com/CaiGuoHui123/Awesome-Lung-Cancer-Detection. △ Less

Submitted 20 June, 2025; v1 submitted 18 October, 2024; originally announced October 2024.

Comments: Accepted to Engineering Applications of Artificial Intelligence

arXiv:2410.04743 [pdf, other]

Smart energy management: process structure-based hybrid neural networks for optimal scheduling and economic predictive control in integrated systems

Authors: Long Wu, Xunyuan Yin, Lei Pan, Jinfeng Liu

Abstract: Integrated energy systems (IESs) are complex systems consisting of diverse operating units spanning multiple domains. To address its operational challenges, we propose a physics-informed hybrid time-series neural network (NN) surrogate to predict the dynamic performance of IESs across multiple time scales. This neural network-based modeling approach develops time-series multi-layer perceptrons (ML… ▽ More Integrated energy systems (IESs) are complex systems consisting of diverse operating units spanning multiple domains. To address its operational challenges, we propose a physics-informed hybrid time-series neural network (NN) surrogate to predict the dynamic performance of IESs across multiple time scales. This neural network-based modeling approach develops time-series multi-layer perceptrons (MLPs) for the operating units and integrates them with prior process knowledge about system structure and fundamental dynamics. This integration forms three hybrid NNs (long-term, slow, and fast MLPs) that predict the entire system dynamics across multiple time scales. Leveraging these MLPs, we design an NN-based scheduler and an NN-based economic model predictive control (NEMPC) framework to meet global operational requirements: rapid electrical power responsiveness to operators requests, adequate cooling supply to customers, and increased system profitability, while addressing the dynamic time-scale multiplicity present in IESs. The proposed day-ahead scheduler is formulated using the ReLU network-based MLP, which effectively represents IES performance under a broad range of conditions from a long-term perspective. The scheduler is then exactly recast into a mixed-integer linear programming problem for efficient evaluation. The real-time NEMPC, based on slow and fast MLPs, comprises two sequential distributed control agents: a slow NEMPC for the cooling-dominant subsystem with slower transient responses and a fast NEMPC for the power-dominant subsystem with faster responses. Extensive simulations demonstrate that the developed scheduler and NEMPC schemes outperform their respective benchmark scheduler and controller by about 25% and 40%. Together, they enhance overall system performance by over 70% compared to benchmark approaches. △ Less

Submitted 7 October, 2024; originally announced October 2024.

arXiv:2409.09876 [pdf, other]

A Carryover Storage Valuation Framework for Medium-Term Cascaded Hydropower Planning: A Portland General Electric System Study

Authors: Xianbang Chen, Yikui Liu, Zhiming Zhong, Neng Fan, Zhechong Zhao, Lei Wu

Abstract: Medium-term planning of cascaded hydropower (CHP) determines appropriate carryover storage levels in reservoirs to optimize the usage of available water resources. This optimization seeks to maximize the hydropower generated in the current period (i.e., immediate benefit) plus the potential hydropower generation in the future period (i.e., future value). Thus, in the medium-term CHP planning, prop… ▽ More Medium-term planning of cascaded hydropower (CHP) determines appropriate carryover storage levels in reservoirs to optimize the usage of available water resources. This optimization seeks to maximize the hydropower generated in the current period (i.e., immediate benefit) plus the potential hydropower generation in the future period (i.e., future value). Thus, in the medium-term CHP planning, properly quantifying the future value deposited in carryover storage is essential to achieve a balanced trade-off between immediate benefit and future value. To this end, this paper presents a framework to quantify the future value of carryover storage, which consists of three major steps: i) constructing a model to calculate the maximum possible hydropower generation that a given level of carryover storage can deliver in the future period; ii) extracting the implicit locational marginal water value (LMWV) of carryover storage for each reservoir by applying a partition-then-extract algorithm to the constructed model; and iii) developing a set of analytical rules based on the extracted LMWV to effectively calculate the future value. These rules can be seamlessly integrated into medium-term CHP planning models as tractable mixed-integer linear constraints to quantify the future value properly, and can be easily visualized to offer valuable insights for CHP operators. Finally, numerical results on a CHP system of Portland General Electric demonstrate the effectiveness of the presented framework in determining proper carryover storage values to facilitate medium-term CHP planning. △ Less

Submitted 8 January, 2025; v1 submitted 15 September, 2024; originally announced September 2024.

arXiv:2407.08498 [pdf, other]

ERD: Exponential Retinex decomposition based on weak space and hybrid nonconvex regularization and its denoising application

Authors: Liang Wu, Wenjing Lu, Liming Tang, Zhuang Fang

Abstract: The Retinex theory models the image as a product of illumination and reflection components, which has received extensive attention and is widely used in image enhancement, segmentation and color restoration. However, it has been rarely used in additive noise removal due to the inclusion of both multiplication and addition operations in the Retinex noisy image modeling. In this paper, we propose an… ▽ More The Retinex theory models the image as a product of illumination and reflection components, which has received extensive attention and is widely used in image enhancement, segmentation and color restoration. However, it has been rarely used in additive noise removal due to the inclusion of both multiplication and addition operations in the Retinex noisy image modeling. In this paper, we propose an exponential Retinex decomposition model based on hybrid non-convex regularization and weak space oscillation-modeling for image denoising. The proposed model utilizes non-convex first-order total variation (TV) and non-convex second-order TV to regularize the reflection component and the illumination component, respectively, and employs weak $H^{-1}$ norm to measure the residual component. By utilizing different regularizers, the proposed model effectively decomposes the image into reflection, illumination, and noise components. An alternating direction multipliers method (ADMM) combined with the Majorize-Minimization (MM) algorithm is developed to solve the proposed model. Furthermore, we provide a detailed proof of the convergence property of the algorithm. Numerical experiments validate both the proposed model and algorithm. Compared with several state-of-the-art denoising models, the proposed model exhibits superior performance in terms of peak signal-to-noise ratio (PSNR) and mean structural similarity (MSSIM). △ Less

Submitted 20 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.03753 [pdf]

Enhanced Support Vector Machine Based Signal Recovery in Bandwidth-Limited 50-100 Gbit/s Flexible DS-PON

Authors: Liyan Wu, Yanlu Huang, Kai Jin, Shangya Han, Kun Xu, Yanni Ou

Abstract: We proposed an adaptive signal recovery algorithm with reduced complexity based on the SVM principle for flexible downstream PON. Experimental results indicate a record-high link power budget of 24 dB for bandwidth-limited 100 Gbit/s direct-detection transmission@1E-3. We proposed an adaptive signal recovery algorithm with reduced complexity based on the SVM principle for flexible downstream PON. Experimental results indicate a record-high link power budget of 24 dB for bandwidth-limited 100 Gbit/s direct-detection transmission@1E-3. △ Less

Submitted 14 February, 2025; v1 submitted 4 July, 2024; originally announced July 2024.

Comments: We propose SVM algorithms with different solvers for signal formats like NRZ and PAM4. This simplifies complexity in flexible downstream PON while maintaining performance

arXiv:2406.19856 [pdf]

LUT-Assisted Clock Data Recovery and Equalization for Burst-Mode 50-100 Gbit/s Bandwidth-Limited Flexible PON

Authors: Yanlu Huang, Liyan Wu, Shangya Han, Kai Jin, Kun Xu, Yanni Ou

Abstract: We demonstrated LUT-assisted CDR and equalization for burst-mode 50-100 Gbit/s bandwidth-limited PON, achieving signal recovery under large 100 ppm frequency offsets and 0.5 UI phase mismatch using reduced 50ns preambles, with 0.3dB sensitivity penalty only. We demonstrated LUT-assisted CDR and equalization for burst-mode 50-100 Gbit/s bandwidth-limited PON, achieving signal recovery under large 100 ppm frequency offsets and 0.5 UI phase mismatch using reduced 50ns preambles, with 0.3dB sensitivity penalty only. △ Less

Submitted 14 February, 2025; v1 submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.19043 [pdf]

CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI

Authors: Zi Wang, Fanwen Wang, Chen Qin, Jun Lyu, Cheng Ouyang, Shuo Wang, Yan Li, Mengyao Yu, Haoyu Zhang, Kunyuan Guo, Zhang Shi, Qirong Li, Ziqiang Xu, Yajing Zhang, Hao Li, Sha Hua, Binghua Chen, Longyu Sun, Mengting Sun, Qin Li, Ying-Hua Chu, Wenjia Bai, Jing Qin, Xiahai Zhuang, Claudia Prieto , et al. (7 additional authors not shown)

Abstract: Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h… ▽ More Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover high-quality, clinically interpretable images from undersampled measurements. However, the lack of publicly available cardiac MRI k-space dataset in terms of both quantity and diversity has severely hindered substantial technological progress, particularly for data-driven artificial intelligence. Here, we provide a standardized, diverse, and high-quality CMRxRecon2024 dataset to facilitate the technical development, fair evaluation, and clinical transfer of cardiac MRI reconstruction approaches, towards promoting the universal frameworks that enable fast and robust reconstructions across different cardiac MRI protocols in clinical practice. To the best of our knowledge, the CMRxRecon2024 dataset is the largest and most protocal-diverse publicly available cardiac k-space dataset. It is acquired from 330 healthy volunteers, covering commonly used modalities, anatomical views, and acquisition trajectories in clinical cardiac MRI workflows. Besides, an open platform with tutorials, benchmarks, and data processing tools is provided to facilitate data usage, advanced method development, and fair performance evaluation. △ Less

Submitted 16 January, 2025; v1 submitted 27 June, 2024; originally announced June 2024.

Comments: 23 pages, 3 figures, 2 tables

arXiv:2406.10682 [pdf, other]

Inverse Kinematics with Vision-Based Constraints

Authors: Liangting Wu, Roberto Tron

Abstract: This paper introduces the Visual Inverse Kinematics problem (VIK) to fill the gap between robot Inverse Kinematics (IK) and visual servo control. Different from the IK problem, the VIK problem seeks to find robot configurations subject to vision-based constraints, in addition to kinematic constraints. In this work, we develop a formulation of the VIK problem with a Field of View (FoV) constraint,… ▽ More This paper introduces the Visual Inverse Kinematics problem (VIK) to fill the gap between robot Inverse Kinematics (IK) and visual servo control. Different from the IK problem, the VIK problem seeks to find robot configurations subject to vision-based constraints, in addition to kinematic constraints. In this work, we develop a formulation of the VIK problem with a Field of View (FoV) constraint, enforcing the visibility of an object from a camera on the robot. Our proposed solution is based on the idea of adding a virtual kinematic chain connecting the physical robot and the object; the FoV constraint is then equivalent to a joint angle kinematic constraint. Along the way, we introduce multiple vision-based cost functions to fulfill different objectives. We solve this formulation of the VIK problem using a method that involves a semidefinite program (SDP) constraint followed by a rank minimization algorithm. The performance of this method for solving the VIK problem is validated through simulations. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2405.18739 [pdf, other]

FlocOff: Data Heterogeneity Resilient Federated Learning with Communication-Efficient Edge Offloading

Authors: Mulei Ma, Chenyu Gong, Liekang Zeng, Yang Yang, Liantao Wu

Abstract: Federated Learning (FL) has emerged as a fundamental learning paradigm to harness massive data scattered at geo-distributed edge devices in a privacy-preserving way. Given the heterogeneous deployment of edge devices, however, their data are usually Non-IID, introducing significant challenges to FL including degraded training accuracy, intensive communication costs, and high computing complexity.… ▽ More Federated Learning (FL) has emerged as a fundamental learning paradigm to harness massive data scattered at geo-distributed edge devices in a privacy-preserving way. Given the heterogeneous deployment of edge devices, however, their data are usually Non-IID, introducing significant challenges to FL including degraded training accuracy, intensive communication costs, and high computing complexity. Towards that, traditional approaches typically utilize adaptive mechanisms, which may suffer from scalability issues, increased computational overhead, and limited adaptability to diverse edge environments. To address that, this paper instead leverages the observation that the computation offloading involves inherent functionalities such as node matching and service correlation to achieve data reshaping and proposes Federated learning based on computing Offloading (FlocOff) framework, to address data heterogeneity and resource-constrained challenges. Specifically, FlocOff formulates the FL process with Non-IID data in edge scenarios and derives rigorous analysis on the impact of imbalanced data distribution. Based on this, FlocOff decouples the optimization in two steps, namely : (1) Minimizes the Kullback-Leibler (KL) divergence via Computation Offloading scheduling (MKL-CO); (2) Minimizes the Communication Cost through Resource Allocation (MCC-RA). Extensive experimental results demonstrate that the proposed FlocOff effectively improves model convergence and accuracy by 14.3\%-32.7\% while reducing data heterogeneity under various data distributions. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.16980 [pdf, other]

DSU-Net: Dynamic Snake U-Net for 2-D Seismic First Break Picking

Authors: Hongtao Wang, Rongyu Feng, Liangyi Wu, Mutian Liu, Yinuo Cui, Chunxia Zhang, Zhenbo Guo

Abstract: In seismic exploration, identifying the first break (FB) is a critical component in establishing subsurface velocity models. Various automatic picking techniques based on deep neural networks have been developed to expedite this procedure. The most popular class is using semantic segmentation networks to pick on a shot gather called 2-dimensional (2-D) picking. Generally, 2-D segmentation-based pi… ▽ More In seismic exploration, identifying the first break (FB) is a critical component in establishing subsurface velocity models. Various automatic picking techniques based on deep neural networks have been developed to expedite this procedure. The most popular class is using semantic segmentation networks to pick on a shot gather called 2-dimensional (2-D) picking. Generally, 2-D segmentation-based picking methods input an image of a shot gather, and output a binary segmentation map, in which the maximum of each column is the location of FB. However, current designed segmentation networks is difficult to ensure the horizontal continuity of the segmentation. Additionally, FB jumps also exist in some areas, and it is not easy for current networks to detect such jumps. Therefore, it is important to pick as much as possible and ensure horizontal continuity. To alleviate this problem, we propose a novel semantic segmentation network for the 2-D seismic FB picking task, where we introduce the dynamic snake convolution into U-Net and call the new segmentation network dynamic-snake U-Net (DSU-Net). Specifically, we develop original dynamic-snake convolution (DSConv) in CV and propose a novel DSConv module, which can extract the horizontal continuous feature in the shallow feature of the shot gather. Many experiments have shown that DSU-Net demonstrates higher accuracy and robustness than the other 2-D segmentation-based models, achieving state-of-the-art (SOTA) performance in 2-D seismic field surveys. Particularly, it can effectively detect FB jumps and better ensure the horizontal continuity of FB. In addition, the ablation experiment and the anti-noise experiment, respectively, verify the optimal structure of the DSConv module and the robustness of the picking. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16084 [pdf, other]

A Low-Cost Teleoperable Surgical Robot with a Macro-Micro Structure and a Continuum Tip for Open-Source Research

Authors: Lachlan Scott, Tangyou Liu, Liao Wu

Abstract: Surgical robotic systems equipped with microscale, high-dexterity manipulators have shown promising results in minimally invasive surgery (MIS). One barrier to the widespread adoption of such systems is the prohibitive cost of research and development efforts using current state-of-the-art equipment. To address this challenge, this paper proposes a low-cost and modifiable tendon-driven continuum m… ▽ More Surgical robotic systems equipped with microscale, high-dexterity manipulators have shown promising results in minimally invasive surgery (MIS). One barrier to the widespread adoption of such systems is the prohibitive cost of research and development efforts using current state-of-the-art equipment. To address this challenge, this paper proposes a low-cost and modifiable tendon-driven continuum manipulator for MIS applications. The device is capable of being teleoperated in conjunction with a macro-scale six-axis robotic arm using a haptic stylus. Its control software incorporates and extends freely available and open-source software packages. For verification, we perform teleoperation trials on the proposed continuum manipulator using an electromagnetic tracker. We then integrate the manipulator with a UR5e robotic arm. A series of simulated tumour biopsies were conducted using the integrated robotic system and an anatomical model (phantom), validating its potential efficacy in MIS applications. The complete source code, CAD files for all additively manufactured components, a parts list for the manipulator, and a demonstration video of the proposed system are made available in this work. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: 6 pages, 10 figures, accepted by AIM2024

arXiv:2405.10570 [pdf]

Simultaneous Deep Learning of Myocardium Segmentation and T2 Quantification for Acute Myocardial Infarction MRI

Authors: Yirong Zhou, Chengyan Wang, Mengtian Lu, Kunyuan Guo, Zi Wang, Dan Ruan, Rui Guo, Peijun Zhao, Jianhua Wang, Naiming Wu, Jianzhong Lin, Yinyin Chen, Hang Jin, Lianxin Xie, Lilan Wu, Liuhong Zhu, Jianjun Zhou, Congbo Cai, He Wang, Xiaobo Qu

Abstract: In cardiac Magnetic Resonance Imaging (MRI) analysis, simultaneous myocardial segmentation and T2 quantification are crucial for assessing myocardial pathologies. Existing methods often address these tasks separately, limiting their synergistic potential. To address this, we propose SQNet, a dual-task network integrating Transformer and Convolutional Neural Network (CNN) components. SQNet features… ▽ More In cardiac Magnetic Resonance Imaging (MRI) analysis, simultaneous myocardial segmentation and T2 quantification are crucial for assessing myocardial pathologies. Existing methods often address these tasks separately, limiting their synergistic potential. To address this, we propose SQNet, a dual-task network integrating Transformer and Convolutional Neural Network (CNN) components. SQNet features a T2-refine fusion decoder for quantitative analysis, leveraging global features from the Transformer, and a segmentation decoder with multiple local region supervision for enhanced accuracy. A tight coupling module aligns and fuses CNN and Transformer branch features, enabling SQNet to focus on myocardium regions. Evaluation on healthy controls (HC) and acute myocardial infarction patients (AMI) demonstrates superior segmentation dice scores (89.3/89.2) compared to state-of-the-art methods (87.7/87.9). T2 quantification yields strong linear correlations (Pearson coefficients: 0.84/0.93) with label values for HC/AMI, indicating accurate mapping. Radiologist evaluations confirm SQNet's superior image quality scores (4.60/4.58 for segmentation, 4.32/4.42 for T2 quantification) over state-of-the-art methods (4.50/4.44 for segmentation, 3.59/4.37 for T2 quantification). SQNet thus offers accurate simultaneous segmentation and quantification, enhancing cardiac disease diagnosis, such as AMI. △ Less

Submitted 29 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

Comments: 10 pages, 8 figures, 6 tables

arXiv:2404.15978 [pdf, other]

Learning deep Koopman operators with convex stability constraints

Authors: Marc Mitjans, Liangting Wu, Roberto Tron

Abstract: In this paper, we present a novel sufficient condition for the stability of discrete-time linear systems that can be represented as a set of piecewise linear constraints, which make them suitable for quadratic programming optimization problems. More specifically, we tackle the problem of imposing asymptotic stability to a Koopman matrix learned from data during iterative gradient descent optimizat… ▽ More In this paper, we present a novel sufficient condition for the stability of discrete-time linear systems that can be represented as a set of piecewise linear constraints, which make them suitable for quadratic programming optimization problems. More specifically, we tackle the problem of imposing asymptotic stability to a Koopman matrix learned from data during iterative gradient descent optimization processes. We show that this sufficient condition can be decoupled by rows of the system matrix, and propose a control barrier function-based projected gradient descent to enforce gradual evolution towards the stability set by running an optimization-in-the-loop during the iterative learning process. We compare the performance of our algorithm with other two recent approaches in the literature, and show that we get close to state-of-the-art performance while providing the added flexibility of allowing the optimization problem to be further customized for specific applications. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 7 pages, 3 figures, 1 table, submitted to IEEE Conference on Decision and Control (CDC) 2024

arXiv:2404.01082 [pdf, other]

The state-of-the-art in Cardiac MRI Reconstruction: Results of the CMRxRecon Challenge in MICCAI 2023

Authors: Jun Lyu, Chen Qin, Shuo Wang, Fanwen Wang, Yan Li, Zi Wang, Kunyuan Guo, Cheng Ouyang, Michael Tänzer, Meng Liu, Longyu Sun, Mengting Sun, Qin Li, Zhang Shi, Sha Hua, Hao Li, Zhensen Chen, Zhenlin Zhang, Bingyu Xin, Dimitris N. Metaxas, George Yiasemis, Jonas Teuwen, Liping Zhang, Weitian Chen, Yidong Zhao , et al. (25 additional authors not shown)

Abstract: Cardiac MRI, crucial for evaluating heart structure and function, faces limitations like slow imaging and motion artifacts. Undersampling reconstruction, especially data-driven algorithms, has emerged as a promising solution to accelerate scans and enhance imaging performance using highly under-sampled data. Nevertheless, the scarcity of publicly available cardiac k-space datasets and evaluation p… ▽ More Cardiac MRI, crucial for evaluating heart structure and function, faces limitations like slow imaging and motion artifacts. Undersampling reconstruction, especially data-driven algorithms, has emerged as a promising solution to accelerate scans and enhance imaging performance using highly under-sampled data. Nevertheless, the scarcity of publicly available cardiac k-space datasets and evaluation platform hinder the development of data-driven reconstruction algorithms. To address this issue, we organized the Cardiac MRI Reconstruction Challenge (CMRxRecon) in 2023, in collaboration with the 26th International Conference on MICCAI. CMRxRecon presented an extensive k-space dataset comprising cine and mapping raw data, accompanied by detailed annotations of cardiac anatomical structures. With overwhelming participation, the challenge attracted more than 285 teams and over 600 participants. Among them, 22 teams successfully submitted Docker containers for the testing phase, with 7 teams submitted for both cine and mapping tasks. All teams use deep learning based approaches, indicating that deep learning has predominately become a promising solution for the problem. The first-place winner of both tasks utilizes the E2E-VarNet architecture as backbones. In contrast, U-Net is still the most popular backbone for both multi-coil and single-coil reconstructions. This paper provides a comprehensive overview of the challenge design, presents a summary of the submitted results, reviews the employed methods, and offers an in-depth discussion that aims to inspire future advancements in cardiac MRI reconstruction models. The summary emphasizes the effective strategies observed in Cardiac MRI reconstruction, including backbone architecture, loss function, pre-processing techniques, physical modeling, and model complexity, thereby providing valuable insights for further developments in this field. △ Less

Submitted 16 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

Comments: 25 pages, 17 figures

arXiv:2403.18235 [pdf, other]

A Parallel Vector-form $LDL^\top$ Decomposition for Accelerating Execution-time-certified $\ell_1$-penalty Soft-constrained MPC

Authors: Liang Wu, Liwei Zhou, Richard D. Braatz

Abstract: Handling possible infeasibility and providing an execution time certificate are two pressing requirements of real-time Model Predictive Control (MPC). To meet these two requirements simultaneously, this paper proposes an $\ell_1$-penalty soft-constrained MPC formulation that is globally feasible and solvable with an execution time certificate using our proposed algorithm. This paper proves for the… ▽ More Handling possible infeasibility and providing an execution time certificate are two pressing requirements of real-time Model Predictive Control (MPC). To meet these two requirements simultaneously, this paper proposes an $\ell_1$-penalty soft-constrained MPC formulation that is globally feasible and solvable with an execution time certificate using our proposed algorithm. This paper proves for the first time that $\ell_1$-penalty soft-constrained MPC problems can be equivalently transformed into a box-constrained quadratic programming (Box-QP) and then our previous execution-time-certified algorithm \cite{wu2023direct} (only limited to Box-QP) can be applied. However, our previous Box-QP algorithm \cite{wu2023direct}, which provides a theoretical execution-time certificate, is conservative in its iteration analysis, thus sacrificing computation efficiency. To this end, this paper proposes a novel $LDL^\top$ decomposition for the first time, to accelerate the computation of Newton step at each iteration. The speedup of our $LDL^\top$ decomposition comes from two-fold: \textit{i)} exploitation of the fact that the number of inequality constraints is generally larger than the number of variables in condensed MPC formulations, \textit{ii)} vectorized and parallel implementation based on based on its vector-wise operations, instead of element-wise operations of previous decomposition methods. Numerical experiments demonstrate great speedups of the proposed $LDL^\top$ decomposition (even up to 1000-fold, compared to the standard Choleksky method), which thus helps our solver achieve comparable computation performance to the state-of-the-art solvers such as IPOPT and OSQP. Code is available at \url{https://github.com/liangwu2019/L1-penalty-QP}. △ Less

Submitted 8 August, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

Comments: 11 pages

arXiv:2403.13643 [pdf]

Vibration Sensitivity of one-port and two-port MEMS microphones

Authors: Francis Doyon-D'Amour, Carly Stalder, Timothy Hodges, Michel Stephan, Lixiue Wu, Triantafillos Koukoulas, Stephane Leahy, Raphael St-Gelais

Abstract: Micro-electro-mechanical system (MEMS) microphones (mics) with two acoustic ports are currently receiving considerable interest, with the promise of achieving higher directional sensitivity compared to traditional one-port architectures. However, measuring pressure differences in two-port microphones typically commands sensing elements that are softer than in one-port mics, and are therefore presu… ▽ More Micro-electro-mechanical system (MEMS) microphones (mics) with two acoustic ports are currently receiving considerable interest, with the promise of achieving higher directional sensitivity compared to traditional one-port architectures. However, measuring pressure differences in two-port microphones typically commands sensing elements that are softer than in one-port mics, and are therefore presumably more prone to interference from external vibration. Here we derive a universal expression for microphone sensitivity to vibration and we experimentally demonstrate its validity for several emerging two-port microphone technologies. We also perform vibration measurements on a one-port mic, thus providing a one-stop direct comparison between one-port and two-port sensing approaches. We find that the acoustically-referred vibration sensitivity of two-port MEMS mics, in units of measured acoustic pressure per external acceleration (i.e., Pascals per g), does not depend on the sensing element stiffness nor on its natural frequency. We also show that this vibration sensitivity in two-port mics is inversely proportional to frequency as opposed to the frequency independent behavior observed in one-port mics. This is confirmed experimentally for several types of microphone packages. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: 8 pages, 14 figures

arXiv:2403.13225 [pdf, other]

Modeling the Label Distributions for Weakly-Supervised Semantic Segmentation

Authors: Linshan Wu, Zhun Zhong, Jiayi Ma, Yunchao Wei, Hao Chen, Leyuan Fang, Shutao Li

Abstract: Weakly-Supervised Semantic Segmentation (WSSS) aims to train segmentation models by weak labels, which is receiving significant attention due to its low annotation cost. Existing approaches focus on generating pseudo labels for supervision while largely ignoring to leverage the inherent semantic correlation among different pseudo labels. We observe that pseudo-labeled pixels that are close to each… ▽ More Weakly-Supervised Semantic Segmentation (WSSS) aims to train segmentation models by weak labels, which is receiving significant attention due to its low annotation cost. Existing approaches focus on generating pseudo labels for supervision while largely ignoring to leverage the inherent semantic correlation among different pseudo labels. We observe that pseudo-labeled pixels that are close to each other in the feature space are more likely to share the same class, and those closer to the distribution centers tend to have higher confidence. Motivated by this, we propose to model the underlying label distributions and employ cross-label constraints to generate more accurate pseudo labels. In this paper, we develop a unified WSSS framework named Adaptive Gaussian Mixtures Model, which leverages a GMM to model the label distributions. Specifically, we calculate the feature distribution centers of pseudo-labeled pixels and build the GMM by measuring the distance between the centers and each pseudo-labeled pixel. Then, we introduce an Online Expectation-Maximization (OEM) algorithm and a novel maximization loss to optimize the GMM adaptively, aiming to learn more discriminative decision boundaries between different class-wise Gaussian mixtures. Based on the label distributions, we leverage the GMM to generate high-quality pseudo labels for more reliable supervision. Our framework is capable of solving different forms of weak labels: image-level labels, points, scribbles, blocks, and bounding-boxes. Extensive experiments on PASCAL, COCO, Cityscapes, and ADE20K datasets demonstrate that our framework can effectively provide more reliable supervision and outperform the state-of-the-art methods under all settings. Code will be available at https://github.com/Luffy03/AGMM-SASS. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.12235 [pdf, other]

IKSPARK: An Inverse Kinematics Solver using Semidefinite Relaxation and Rank Minimization

Authors: Liangting Wu, Roberto Tron

Abstract: Inverse kinematics (IK) is a fundamental problem frequently occurred in robot control and motion planning. However, the problem is nonconvex because the kinematic map between the configuration and task spaces is generally nonlinear, which makes it challenging for fast and accurate solutions. The problem can be more complicated with the existence of different physical constraints imposed by the rob… ▽ More Inverse kinematics (IK) is a fundamental problem frequently occurred in robot control and motion planning. However, the problem is nonconvex because the kinematic map between the configuration and task spaces is generally nonlinear, which makes it challenging for fast and accurate solutions. The problem can be more complicated with the existence of different physical constraints imposed by the robot structure. In this paper, we develop an inverse kinematics solver named IKSPARK (Inverse Kinematics using Semidefinite Programming And RanK minimization) that can find solutions for robots with various structures, including open/closed kinematic chains, spherical, revolute, and/or prismatic joints. The solver works in the space of rotation matrices of the link reference frames and involves solving only convex semidefinite problems (SDPs). Specifically, the IK problem is formulated as an SDP with an additional rank-1 constraint on symmetric matrices with constant traces. The solver first solves this SDP disregarding the rank constraint to get a start point and then finds the rank-1 solution iteratively via a rank minimization algorithm with proven local convergence. Compared to other work that performs SDP relaxation for IK problems, our formulation is simpler, and uses variables with smaller sizes. We validate our approach via simulations on different robots, comparing against a standard IK method. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.04374 [pdf, other]

Model-Free Load Frequency Control of Nonlinear Power Systems Based on Deep Reinforcement Learning

Authors: Xiaodi Chen, Meng Zhang, Zhengguang Wu, Ligang Wu, Xiaohong Guan

Abstract: Load frequency control (LFC) is widely employed in power systems to stabilize frequency fluctuation and guarantee power quality. However, most existing LFC methods rely on accurate power system modeling and usually ignore the nonlinear characteristics of the system, limiting controllers' performance. To solve these problems, this paper proposes a model-free LFC method for nonlinear power systems b… ▽ More Load frequency control (LFC) is widely employed in power systems to stabilize frequency fluctuation and guarantee power quality. However, most existing LFC methods rely on accurate power system modeling and usually ignore the nonlinear characteristics of the system, limiting controllers' performance. To solve these problems, this paper proposes a model-free LFC method for nonlinear power systems based on deep deterministic policy gradient (DDPG) framework. The proposed method establishes an emulator network to emulate power system dynamics. After defining the action-value function, the emulator network is applied for control actions evaluation instead of the critic network. Then the actor network controller is effectively optimized by estimating the policy gradient based on zeroth-order optimization (ZOO) and backpropagation algorithm. Simulation results and corresponding comparisons demonstrate the designed controller can generate appropriate control actions and has strong adaptability for nonlinear power systems. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2402.17300 [pdf, other]

VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis

Authors: Linshan Wu, Jiaxin Zhuang, Hao Chen

Abstract: Self-Supervised Learning (SSL) has demonstrated promising results in 3D medical image analysis. However, the lack of high-level semantics in pre-training still heavily hinders the performance of downstream tasks. We observe that 3D medical images contain relatively consistent contextual position information, i.e., consistent geometric relations between different organs, which leads to a potential… ▽ More Self-Supervised Learning (SSL) has demonstrated promising results in 3D medical image analysis. However, the lack of high-level semantics in pre-training still heavily hinders the performance of downstream tasks. We observe that 3D medical images contain relatively consistent contextual position information, i.e., consistent geometric relations between different organs, which leads to a potential way for us to learn consistent semantic representations in pre-training. In this paper, we propose a simple-yet-effective Volume Contrast (VoCo) framework to leverage the contextual position priors for pre-training. Specifically, we first generate a group of base crops from different regions while enforcing feature discrepancy among them, where we employ them as class assignments of different regions. Then, we randomly crop sub-volumes and predict them belonging to which class (located at which region) by contrasting their similarity to different base crops, which can be seen as predicting contextual positions of different sub-volumes. Through this pretext task, VoCo implicitly encodes the contextual position priors into model representations without the guidance of annotations, enabling us to effectively improve the performance of downstream tasks that require high-level semantics. Extensive experimental results on six downstream tasks demonstrate the superior effectiveness of VoCo. Code will be available at https://github.com/Luffy03/VoCo. △ Less

Submitted 17 April, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

Comments: Accepted by CVPR 2024. The camera-ready version will soon be available

arXiv:2402.16186 [pdf, other]

An Execution-time-certified Riccati-based IPM Algorithm for RTI-based Input-constrained NMPC

Authors: Liang Wu, Krystian Ganko, Shimin Wang, Richard D. Braatz

Abstract: Establishing an execution time certificate in deploying model predictive control (MPC) is a pressing and challenging requirement. As nonlinear MPC (NMPC) results in nonlinear programs, differing from quadratic programs encountered in linear MPC, deriving an execution time certificate for NMPC seems an impossible task. Our prior work \cite{wu2023direct} introduced an input-constrained MPC algorithm… ▽ More Establishing an execution time certificate in deploying model predictive control (MPC) is a pressing and challenging requirement. As nonlinear MPC (NMPC) results in nonlinear programs, differing from quadratic programs encountered in linear MPC, deriving an execution time certificate for NMPC seems an impossible task. Our prior work \cite{wu2023direct} introduced an input-constrained MPC algorithm with the exact and only \textit{dimension-dependent} (\textit{data-independent}) number of floating-point operations ([flops]). This paper extends it to input-constrained NMPC problems via the real-time iteration (RTI) scheme, which results in \textit{data-varying} (but \textit{dimension-invariant}) input-constrained MPC problems. Therefore, applying our previous algorithm can certify the execution time based on the assumption that processors perform fixed [flops] in constant time. As the RTI-based scheme generally results in MPC with a long prediction horizon, this paper employs the efficient factorized Riccati recursion, whose computational cost scales linearly with the prediction horizon, to solve the Newton system at each iteration. The execution-time certified capability of the algorithm is theoretically and numerically validated through a case study involving nonlinear control of the chaotic Lorenz system. △ Less

Submitted 25 February, 2024; originally announced February 2024.

Comments: 7 pages

arXiv:2402.11421 [pdf, other]

Analysis of Fatigue-Induced Compensatory Movements in Bicep Curls: Gaining Insights for the Deployment of Wearable Sensors

Authors: Ming Xuan Chua, Yoshiro Okubo, Shuhua Peng, Thanh Nho Do, Chun Hui Wang, Liao Wu

Abstract: A common challenge in Bicep Curls rehabilitation is muscle compensation, where patients adopt alternative movement patterns when the primary muscle group cannot act due to injury or fatigue, significantly decreasing the effectiveness of rehabilitation efforts. The problem is exacerbated by the growing trend toward transitioning from in-clinic to home-based rehabilitation, where constant monitoring… ▽ More A common challenge in Bicep Curls rehabilitation is muscle compensation, where patients adopt alternative movement patterns when the primary muscle group cannot act due to injury or fatigue, significantly decreasing the effectiveness of rehabilitation efforts. The problem is exacerbated by the growing trend toward transitioning from in-clinic to home-based rehabilitation, where constant monitoring and correction by physiotherapists are limited. Developing wearable sensors capable of detecting muscle compensation becomes crucial to address this challenge. This study aims to gain insights into the optimal deployment of wearable sensors through a comprehensive study of muscle compensation in Bicep Curls. We collect upper limb joint kinematics and surface electromyography signals (sEMG) from eight muscles in 12 healthy subjects during standard and fatigue stages. Two muscle synergies are derived from sEMG signals and are analyzed comprehensively along with joint kinematics. Our findings reveal a shift in the relative contribution of forearm muscles to shoulder muscles, accompanied by a significant increase in activation amplitude for both synergies. Additionally, more pronounced movement was observed at the shoulder joint during fatigue. These results suggest focusing on the shoulder muscle activities and joint motions when deploying wearable sensors to effectively detect compensatory movements. △ Less

Submitted 25 May, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

Comments: 11 pages, 7 figures, accepted by T-MRB

arXiv:2401.15600 [pdf, other]

A Mechatronic System for the Visualisation and Analysis of Orchestral Conducting

Authors: Courtney Coates, Liao Wu

Abstract: This paper quantitatively analysed orchestral conducting patterns, and detected variations as a result of extraneous body movement during conducting, in the first experiment of its kind. A novel live conducting system featuring data capture, processing, and analysis was developed. Reliable data of an expert conductor's movements was collected, processed, and used to calculate average trajectories… ▽ More This paper quantitatively analysed orchestral conducting patterns, and detected variations as a result of extraneous body movement during conducting, in the first experiment of its kind. A novel live conducting system featuring data capture, processing, and analysis was developed. Reliable data of an expert conductor's movements was collected, processed, and used to calculate average trajectories for different conducting techniques with various extraneous body movements; variations between extraneous body movement techniques and controlled technique were definitively determined in a novel quantitative analysis. A portable and affordable mechatronic system was created to capture and process live baton tip data, and was found to be accurate through calibration against a reliable reference. Experimental conducting field data was captured through the mechatronic system, and analysed against previously calculated average trajectories; the extraneous movement used during the field data capture was successfully identified by the system. △ Less

Submitted 28 January, 2024; originally announced January 2024.

Comments: 10 pages, 10 figures, accepted by ACRA2023

Journal ref: Australasian Conference on Robotics and Automation (ACRA 2023). Sydney, Australia, 2023: 1-10

arXiv:2401.13957 [pdf, other]

doi 10.1109/TRO.2024.3486177

Automatic Tissue Traction Using Miniature Force-Sensing Forceps for Minimally Invasive Surgery

Authors: Tangyou Liu, Xiaoyi Wang, Jay Katupitiya, Jiaole Wang, Liao Wu

Abstract: A common limitation of autonomous tissue manipulation in robotic minimally invasive surgery (MIS) is the absence of force sensing and control at the tool level. Recently, our team has developed miniature force-sensing forceps that can simultaneously measure the grasping and pulling forces during tissue manipulation. Based on this design, here we further present a method to automate tissue traction… ▽ More A common limitation of autonomous tissue manipulation in robotic minimally invasive surgery (MIS) is the absence of force sensing and control at the tool level. Recently, our team has developed miniature force-sensing forceps that can simultaneously measure the grasping and pulling forces during tissue manipulation. Based on this design, here we further present a method to automate tissue traction that comprises grasping and pulling stages. During this process, the grasping and pulling forces can be controlled either separately or simultaneously through force decoupling. The force controller is built upon a static model of tissue manipulation, considering the interaction between the force-sensing forceps and soft tissue. The efficacy of this force control approach is validated through a series of experiments comparing targeted, estimated, and actual reference forces. To verify the feasibility of the proposed method in surgical applications, various tissue resections are conducted on ex vivo tissues employing a dual-arm robotic setup. Finally, we discuss the benefits of multi-force control in tissue traction, evidenced through comparative analyses of various ex vivo tissue resections with and without the proposed method, and the potential generalization with traction on different tissues. The results affirm the feasibility of implementing automatic tissue traction using miniature forceps with multi-force control, suggesting its potential to promote autonomous MIS. A video demonstrating the experiments can be found at https://youtu.be/f5gXuXe67Ak. △ Less

Submitted 2 November, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

Comments: 15 pages, 14 figures, accepted by T-RO

arXiv:2401.04653 [pdf, other]

Time-certified Input-constrained NMPC via Koopman Operator

Authors: Liang Wu, Krystian Ganko, Richard D. Braatz

Abstract: Determining solving-time certificates of nonlinear model predictive control (NMPC) implementations is a pressing requirement when deploying NMPC in production environments. Such a certificate guarantees that the NMPC controller returns a solution before the next sampling time. However, NMPC formulations produce nonlinear programs (NLPs) for which it is very difficult to derive their solving-time c… ▽ More Determining solving-time certificates of nonlinear model predictive control (NMPC) implementations is a pressing requirement when deploying NMPC in production environments. Such a certificate guarantees that the NMPC controller returns a solution before the next sampling time. However, NMPC formulations produce nonlinear programs (NLPs) for which it is very difficult to derive their solving-time certificates. Our previous work, Wu and Braatz (2023), challenged this limitation with a proposed input-constrained MPC algorithm having exact iteration complexity but was restricted to linear MPC formulations. This work extends the algorithm to solve input-constrained NMPC problems, by using the Koopman operator and a condensing MPC technique. We illustrate the algorithm performance on a high-dimensional, nonlinear partial differential equation (PDE) control case study, in which we theoretically and numerically certify the solving time to be less than the sampling time. △ Less

Submitted 26 February, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

Comments: 6 pages, submitted into 8th IFAC Conference on Nonlinear Model Predictive Control NMPC 2024

arXiv:2401.04412 [pdf, other]

Deep Covariance Alignment for Domain Adaptive Remote Sensing Image Segmentation

Authors: Linshan Wu, Ming Lu, Leyuan Fang

Abstract: Unsupervised domain adaptive (UDA) image segmentation has recently gained increasing attention, aiming to improve the generalization capability for transferring knowledge from the source domain to the target domain. However, in high spatial resolution remote sensing image (RSI), the same category from different domains (\emph{e.g.}, urban and rural) can appear to be totally different with extremel… ▽ More Unsupervised domain adaptive (UDA) image segmentation has recently gained increasing attention, aiming to improve the generalization capability for transferring knowledge from the source domain to the target domain. However, in high spatial resolution remote sensing image (RSI), the same category from different domains (\emph{e.g.}, urban and rural) can appear to be totally different with extremely inconsistent distributions, which heavily limits the UDA accuracy. To address this problem, in this paper, we propose a novel Deep Covariance Alignment (DCA) model for UDA RSI segmentation. The DCA can explicitly align category features to learn shared domain-invariant discriminative feature representations, which enhances the ability of model generalization. Specifically, a Category Feature Pooling (CFP) module is first employed to extract category features by combining the coarse outputs and the deep features. Then, we leverage a novel Covariance Regularization (CR) to enforce the intra-category features to be closer and the inter-category features to be further separate. Compared with the existing category alignment methods, our CR aims to regularize the correlation between different dimensions of the features and thus performs more robustly when dealing with the divergent category features of imbalanced and inconsistent distributions. Finally, we propose a stagewise procedure to train the DCA in order to alleviate the error accumulation. Experiments on both Rural-to-Urban and Urban-to-Rural scenarios of the LoveDA dataset demonstrate the superiority of our proposed DCA over other state-of-the-art UDA segmentation methods. Code is available at https://github.com/Luffy03/DCA. △ Less

Submitted 9 January, 2024; originally announced January 2024.

Comments: A paper accepted by TGRS

arXiv:2401.02081 [pdf, ps, other]

Performance Trade-off and Joint Waveform Design for MIMO-OFDM DFRC Systems

Authors: Tianchen Liu, Liang Wu, Bo An, Zaichen Zhang, Jian Dang, Jiangzhou Wang

Abstract: Dual-functional radar-communication (DFRC) has attracted considerable attention. This paper considers the frequency-selective multipath fading environment and proposes DFRC waveform design strategies based on multiple-input and multiple-output (MIMO) and orthogonal frequency division multiplexing (OFDM) techniques. In the proposed waveform design strategies, the Cramer-Rao bound (CRB) of the radar… ▽ More Dual-functional radar-communication (DFRC) has attracted considerable attention. This paper considers the frequency-selective multipath fading environment and proposes DFRC waveform design strategies based on multiple-input and multiple-output (MIMO) and orthogonal frequency division multiplexing (OFDM) techniques. In the proposed waveform design strategies, the Cramer-Rao bound (CRB) of the radar system, the inter-stream interference (ISI) and the achievable rate of the communication system, are respectively considered as the performance metrics. In this paper, we focus on the performance trade-off between the radar system and the communication system, and the optimization problems are formulated. In the ISI minimization based waveform design strategy, the optimization problem is convex and can be easily solved. In the achievable rate maximization based waveform design strategy, we propose a water-filling (WF) and sequential quadratic programming (SQP) based algorithm to derive the covariance matrix and the precoding matrix. Simulation results validate the proposed DFRC waveform designs and show that the achievable rate maximization based strategy has a better performance than the ISI minimization based strategy. △ Less

Submitted 4 January, 2024; originally announced January 2024.

arXiv:2312.15921 [pdf, other]

Hybrid Precoder Design for Angle-of-Departure Estimation with Limited-Resolution Phase Shifters

Authors: Huiping Huang, Musa Furkan Keskin, Henk Wymeersch, Xuesong Cai, Linlong Wu, Johan Thunberg, Fredrik Tufvesson

Abstract: Hybrid analog-digital beamforming stands out as a key enabler for future communication systems with a massive number of antennas. In this paper, we investigate the hybrid precoder design problem for angle-of-departure (AoD) estimation, where we take into account the practical constraint on the limited resolution of phase shifters. Our goal is to design a radio-frequency (RF) precoder and a base-ba… ▽ More Hybrid analog-digital beamforming stands out as a key enabler for future communication systems with a massive number of antennas. In this paper, we investigate the hybrid precoder design problem for angle-of-departure (AoD) estimation, where we take into account the practical constraint on the limited resolution of phase shifters. Our goal is to design a radio-frequency (RF) precoder and a base-band (BB) precoder to estimate AoD of the user with a high accuracy. To this end, we propose a two-step strategy where we first obtain the fully digital precoder that minimizes the angle error bound, and then the resulting digital precoder is decomposed into an RF precoder and a BB precoder, based on the alternating optimization and the alternating direction method of multipliers. Besides, we derive the quantization error upper bound and analyse the convergence behavior of the proposed algorithm. Numerical results demonstrate the superior performance of the proposed method over state-of-the-art baselines. △ Less

Submitted 22 October, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

Comments: This paper has been accepted for publication in IEEE Transactions on Communications

Showing 1–50 of 109 results for author: Wu, L