-
HarmonySeg: Tubular Structure Segmentation with Deep-Shallow Feature Fusion and Growth-Suppression Balanced Loss
Authors:
Yi Huang,
Ke Zhang,
Wei Liu,
Yuanyuan Wang,
Vishal M. Patel,
Le Lu,
Xu Han,
Dakai Jin,
Ke Yan
Abstract:
Accurate segmentation of tubular structures in medical images, such as vessels and airway trees, is crucial for computer-aided diagnosis, radiotherapy, and surgical planning. However, significant challenges exist in algorithm design when faced with diverse sizes, complex topologies, and (often) incomplete data annotation of these structures. We address these difficulties by proposing a new tubular…
▽ More
Accurate segmentation of tubular structures in medical images, such as vessels and airway trees, is crucial for computer-aided diagnosis, radiotherapy, and surgical planning. However, significant challenges exist in algorithm design when faced with diverse sizes, complex topologies, and (often) incomplete data annotation of these structures. We address these difficulties by proposing a new tubular structure segmentation framework named HarmonySeg. First, we design a deep-to-shallow decoder network featuring flexible convolution blocks with varying receptive fields, which enables the model to effectively adapt to tubular structures of different scales. Second, to highlight potential anatomical regions and improve the recall of small tubular structures, we incorporate vesselness maps as auxiliary information. These maps are aligned with image features through a shallow-and-deep fusion module, which simultaneously eliminates unreasonable candidates to maintain high precision. Finally, we introduce a topology-preserving loss function that leverages contextual and shape priors to balance the growth and suppression of tubular structures, which also allows the model to handle low-quality and incomplete annotations. Extensive quantitative experiments are conducted on four public datasets. The results show that our model can accurately segment 2D and 3D tubular structures and outperform existing state-of-the-art methods. External validation on a private dataset also demonstrates good generalizability.
△ Less
Submitted 10 April, 2025;
originally announced April 2025.
-
A Continual Learning-driven Model for Accurate and Generalizable Segmentation of Clinically Comprehensive and Fine-grained Whole-body Anatomies in CT
Authors:
Dazhou Guo,
Zhanghexuan Ji,
Yanzhou Su,
Dandan Zheng,
Heng Guo,
Puyang Wang,
Ke Yan,
Yirui Wang,
Qinji Yu,
Zi Li,
Minfeng Xu,
Jianfeng Zhang,
Haoshen Li,
Jia Ge,
Tsung-Ying Ho,
Bing-Shen Huang,
Tashan Ai,
Kuaile Zhao,
Na Shen,
Qifeng Wang,
Yun Bian,
Tingyu Wu,
Peng Du,
Hua Zhang,
Feng-Ming Kong
, et al. (9 additional authors not shown)
Abstract:
Precision medicine in the quantitative management of chronic diseases and oncology would be greatly improved if the Computed Tomography (CT) scan of any patient could be segmented, parsed and analyzed in a precise and detailed way. However, there is no such fully annotated CT dataset with all anatomies delineated for training because of the exceptionally high manual cost, the need for specialized…
▽ More
Precision medicine in the quantitative management of chronic diseases and oncology would be greatly improved if the Computed Tomography (CT) scan of any patient could be segmented, parsed and analyzed in a precise and detailed way. However, there is no such fully annotated CT dataset with all anatomies delineated for training because of the exceptionally high manual cost, the need for specialized clinical expertise, and the time required to finish the task. To this end, we proposed a novel continual learning-driven CT model that can segment complete anatomies presented using dozens of previously partially labeled datasets, dynamically expanding its capacity to segment new ones without compromising previously learned organ knowledge. Existing multi-dataset approaches are not able to dynamically segment new anatomies without catastrophic forgetting and would encounter optimization difficulty or infeasibility when segmenting hundreds of anatomies across the whole range of body regions. Our single unified CT segmentation model, CL-Net, can highly accurately segment a clinically comprehensive set of 235 fine-grained whole-body anatomies. Composed of a universal encoder, multiple optimized and pruned decoders, CL-Net is developed using 13,952 CT scans from 20 public and 16 private high-quality partially labeled CT datasets of various vendors, different contrast phases, and pathologies. Extensive evaluation demonstrates that CL-Net consistently outperforms the upper limit of an ensemble of 36 specialist nnUNets trained per dataset with the complexity of 5% model size and significantly surpasses the segmentation accuracy of recent leading Segment Anything-style medical image foundation models by large margins. Our continual learning-driven CL-Net model would lay a solid foundation to facilitate many downstream tasks of oncology and chronic diseases using the most widely adopted CT imaging.
△ Less
Submitted 16 March, 2025;
originally announced March 2025.
-
CBW: Towards Dataset Ownership Verification for Speaker Verification via Clustering-based Backdoor Watermarking
Authors:
Yiming Li,
Kaiying Yan,
Shuo Shao,
Tongqing Zhai,
Shu-Tao Xia,
Zhan Qin,
Dacheng Tao
Abstract:
With the increasing adoption of deep learning in speaker verification, large-scale speech datasets have become valuable intellectual property. To audit and prevent the unauthorized usage of these valuable released datasets, especially in commercial or open-source scenarios, we propose a novel dataset ownership verification method. Our approach introduces a clustering-based backdoor watermark (CBW)…
▽ More
With the increasing adoption of deep learning in speaker verification, large-scale speech datasets have become valuable intellectual property. To audit and prevent the unauthorized usage of these valuable released datasets, especially in commercial or open-source scenarios, we propose a novel dataset ownership verification method. Our approach introduces a clustering-based backdoor watermark (CBW), enabling dataset owners to determine whether a suspicious third-party model has been trained on a protected dataset under a black-box setting. The CBW method consists of two key stages: dataset watermarking and ownership verification. During watermarking, we implant multiple trigger patterns in the dataset to make similar samples (measured by their feature similarities) close to the same trigger while dissimilar samples are near different triggers. This ensures that any model trained on the watermarked dataset exhibits specific misclassification behaviors when exposed to trigger-embedded inputs. To verify dataset ownership, we design a hypothesis-test-based framework that statistically evaluates whether a suspicious model exhibits the expected backdoor behavior. We conduct extensive experiments on benchmark datasets, verifying the effectiveness and robustness of our method against potential adaptive attacks. The code for reproducing main experiments is available at https://github.com/Radiant0726/CBW
△ Less
Submitted 5 April, 2025; v1 submitted 1 March, 2025;
originally announced March 2025.
-
Computer-aided shape features extraction and regression models for predicting the ascending aortic aneurysm growth rate
Authors:
Leonardo Geronzi,
Antonio Martinez,
Michel Rochette,
Kexin Yan,
Aline Bel-Brunon,
Pascal Haigron,
Pierre Escrig,
Jacques Tomasi,
Morgan Daniel,
Alain Lalande,
Siyu Lin,
Diana Marcela Marin-Castrillon,
Olivier Bouchot,
Jean Porterie,
Pier Paolo Valentini,
Marco Evangelos Biancolini
Abstract:
Objective: ascending aortic aneurysm growth prediction is still challenging in clinics. In this study, we evaluate and compare the ability of local and global shape features to predict ascending aortic aneurysm growth.
Material and methods: 70 patients with aneurysm, for which two 3D acquisitions were available, are included. Following segmentation, three local shape features are computed: (1) t…
▽ More
Objective: ascending aortic aneurysm growth prediction is still challenging in clinics. In this study, we evaluate and compare the ability of local and global shape features to predict ascending aortic aneurysm growth.
Material and methods: 70 patients with aneurysm, for which two 3D acquisitions were available, are included. Following segmentation, three local shape features are computed: (1) the ratio between maximum diameter and length of the ascending aorta centerline, (2) the ratio between the length of external and internal lines on the ascending aorta and (3) the tortuosity of the ascending tract. By exploiting longitudinal data, the aneurysm growth rate is derived. Using radial basis function mesh morphing, iso-topological surface meshes are created. Statistical shape analysis is performed through unsupervised principal component analysis (PCA) and supervised partial least squares (PLS). Two types of global shape features are identified: three PCA-derived and three PLS-based shape modes. Three regression models are set for growth prediction: two based on gaussian support vector machine using local and PCA-derived global shape features; the third is a PLS linear regression model based on the related global shape features. The prediction results are assessed and the aortic shapes most prone to growth are identified.
Results: the prediction root mean square error from leave-one-out cross-validation is: 0.112 mm/month, 0.083 mm/month and 0.066 mm/month for local, PCA-based and PLS-derived shape features, respectively. Aneurysms close to the root with a large initial diameter report faster growth.
Conclusion: global shape features might provide an important contribution for predicting the aneurysm growth.
△ Less
Submitted 4 March, 2025;
originally announced March 2025.
-
End-to-end Multi-source Visual Prompt Tuning for Survival Analysis in Whole Slide Images
Authors:
Zhongwei Qiu,
Hanqing Chao,
Wenbin Liu,
Yixuan Shen,
Le Lu,
Ke Yan,
Dakai Jin,
Yun Bian,
Hui Jiang
Abstract:
Survival analysis using pathology images poses a considerable challenge, as it requires the localization of relevant information from the multitude of tiles within whole slide images (WSIs). Current methods typically resort to a two-stage approach, where a pre-trained network extracts features from tiles, which are then used by survival models. This process, however, does not optimize the survival…
▽ More
Survival analysis using pathology images poses a considerable challenge, as it requires the localization of relevant information from the multitude of tiles within whole slide images (WSIs). Current methods typically resort to a two-stage approach, where a pre-trained network extracts features from tiles, which are then used by survival models. This process, however, does not optimize the survival models in an end-to-end manner, and the pre-extracted features may not be ideally suited for survival prediction. To address this limitation, we present a novel end-to-end Visual Prompt Tuning framework for survival analysis, named VPTSurv. VPTSurv refines feature embeddings through an efficient encoder-decoder framework. The encoder remains fixed while the framework introduces tunable visual prompts and adaptors, thus permitting end-to-end training specifically for survival prediction by optimizing only the lightweight adaptors and the decoder. Moreover, the versatile VPTSurv framework accommodates multi-source information as prompts, thereby enriching the survival model. VPTSurv achieves substantial increases of 8.7% and 12.5% in the C-index on two immunohistochemical pathology image datasets. These significant improvements highlight the transformative potential of the end-to-end VPT framework over traditional two-stage methods.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
Improved Esophageal Varices Assessment from Non-Contrast CT Scans
Authors:
Chunli Li,
Xiaoming Zhang,
Yuan Gao,
Xiaoli Yin,
Le Lu,
Ling Zhang,
Ke Yan,
Yu Shi
Abstract:
Esophageal varices (EV), a serious health concern resulting from portal hypertension, are traditionally diagnosed through invasive endoscopic procedures. Despite non-contrast computed tomography (NC-CT) imaging being a less expensive and non-invasive imaging modality, it has yet to gain full acceptance as a primary clinical diagnostic tool for EV evaluation. To overcome existing diagnostic challen…
▽ More
Esophageal varices (EV), a serious health concern resulting from portal hypertension, are traditionally diagnosed through invasive endoscopic procedures. Despite non-contrast computed tomography (NC-CT) imaging being a less expensive and non-invasive imaging modality, it has yet to gain full acceptance as a primary clinical diagnostic tool for EV evaluation. To overcome existing diagnostic challenges, we present the Multi-Organ-cOhesion-Network (MOON), a novel framework enhancing the analysis of critical organ features in NC-CT scans for effective assessment of EV. Drawing inspiration from the thorough assessment practices of radiologists, MOON establishes a cohesive multiorgan analysis model that unifies the imaging features of the related organs of EV, namely esophagus, liver, and spleen. This integration significantly increases the diagnostic accuracy for EV. We have compiled an extensive NC-CT dataset of 1,255 patients diagnosed with EV, spanning three grades of severity. Each case is corroborated by endoscopic diagnostic results. The efficacy of MOON has been substantiated through a validation process involving multi-fold cross-validation on 1,010 cases and an independent test on 245 cases, exhibiting superior diagnostic performance compared to methods focusing solely on the esophagus (for classifying severe grade: AUC of 0.864 versus 0.803, and for moderate to severe grades: AUC of 0.832 versus 0.793). To our knowledge, MOON is the first work to incorporate a synchronized multi-organ NC-CT analysis for EV assessment, providing a more acceptable and minimally invasive alternative for patients compared to traditional endoscopy.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
Infrared Image Super-Resolution via Lightweight Information Split Network
Authors:
Shijie Liu,
Kang Yan,
Feiwei Qin,
Changmiao Wang,
Ruiquan Ge,
Kai Zhang,
Jie Huang,
Yong Peng,
Jin Cao
Abstract:
Single image super-resolution (SR) is an established pixel-level vision task aimed at reconstructing a high-resolution image from its degraded low-resolution counterpart. Despite the notable advancements achieved by leveraging deep neural networks for SR, most existing deep learning architectures feature an extensive number of layers, leading to high computational complexity and substantial memory…
▽ More
Single image super-resolution (SR) is an established pixel-level vision task aimed at reconstructing a high-resolution image from its degraded low-resolution counterpart. Despite the notable advancements achieved by leveraging deep neural networks for SR, most existing deep learning architectures feature an extensive number of layers, leading to high computational complexity and substantial memory demands. These issues become particularly pronounced in the context of infrared image SR, where infrared devices often have stringent storage and computational constraints. To mitigate these challenges, we introduce a novel, efficient, and precise single infrared image SR model, termed the Lightweight Information Split Network (LISN). The LISN comprises four main components: shallow feature extraction, deep feature extraction, dense feature fusion, and high-resolution infrared image reconstruction. A key innovation within this model is the introduction of the Lightweight Information Split Block (LISB) for deep feature extraction. The LISB employs a sequential process to extract hierarchical features, which are then aggregated based on the relevance of the features under consideration. By integrating channel splitting and shift operations, the LISB successfully strikes an optimal balance between enhanced SR performance and a lightweight framework. Comprehensive experimental evaluations reveal that the proposed LISN achieves superior performance over contemporary state-of-the-art methods in terms of both SR quality and model complexity, affirming its efficacy for practical deployment in resource-constrained infrared imaging applications.
△ Less
Submitted 27 May, 2024; v1 submitted 17 May, 2024;
originally announced May 2024.
-
CycleINR: Cycle Implicit Neural Representation for Arbitrary-Scale Volumetric Super-Resolution of Medical Data
Authors:
Wei Fang,
Yuxing Tang,
Heng Guo,
Mingze Yuan,
Tony C. W. Mok,
Ke Yan,
Jiawen Yao,
Xin Chen,
Zaiyi Liu,
Le Lu,
Ling Zhang,
Minfeng Xu
Abstract:
In the realm of medical 3D data, such as CT and MRI images, prevalent anisotropic resolution is characterized by high intra-slice but diminished inter-slice resolution. The lowered resolution between adjacent slices poses challenges, hindering optimal viewing experiences and impeding the development of robust downstream analysis algorithms. Various volumetric super-resolution algorithms aim to sur…
▽ More
In the realm of medical 3D data, such as CT and MRI images, prevalent anisotropic resolution is characterized by high intra-slice but diminished inter-slice resolution. The lowered resolution between adjacent slices poses challenges, hindering optimal viewing experiences and impeding the development of robust downstream analysis algorithms. Various volumetric super-resolution algorithms aim to surmount these challenges, enhancing inter-slice resolution and overall 3D medical imaging quality. However, existing approaches confront inherent challenges: 1) often tailored to specific upsampling factors, lacking flexibility for diverse clinical scenarios; 2) newly generated slices frequently suffer from over-smoothing, degrading fine details, and leading to inter-slice inconsistency. In response, this study presents CycleINR, a novel enhanced Implicit Neural Representation model for 3D medical data volumetric super-resolution. Leveraging the continuity of the learned implicit function, the CycleINR model can achieve results with arbitrary up-sampling rates, eliminating the need for separate training. Additionally, we enhance the grid sampling in CycleINR with a local attention mechanism and mitigate over-smoothing by integrating cycle-consistent loss. We introduce a new metric, Slice-wise Noise Level Inconsistency (SNLI), to quantitatively assess inter-slice noise level inconsistency. The effectiveness of our approach is demonstrated through image quality evaluations on an in-house dataset and a downstream task analysis on the Medical Segmentation Decathlon liver tumor dataset.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
The state-of-the-art in Cardiac MRI Reconstruction: Results of the CMRxRecon Challenge in MICCAI 2023
Authors:
Jun Lyu,
Chen Qin,
Shuo Wang,
Fanwen Wang,
Yan Li,
Zi Wang,
Kunyuan Guo,
Cheng Ouyang,
Michael Tänzer,
Meng Liu,
Longyu Sun,
Mengting Sun,
Qin Li,
Zhang Shi,
Sha Hua,
Hao Li,
Zhensen Chen,
Zhenlin Zhang,
Bingyu Xin,
Dimitris N. Metaxas,
George Yiasemis,
Jonas Teuwen,
Liping Zhang,
Weitian Chen,
Yidong Zhao
, et al. (25 additional authors not shown)
Abstract:
Cardiac MRI, crucial for evaluating heart structure and function, faces limitations like slow imaging and motion artifacts. Undersampling reconstruction, especially data-driven algorithms, has emerged as a promising solution to accelerate scans and enhance imaging performance using highly under-sampled data. Nevertheless, the scarcity of publicly available cardiac k-space datasets and evaluation p…
▽ More
Cardiac MRI, crucial for evaluating heart structure and function, faces limitations like slow imaging and motion artifacts. Undersampling reconstruction, especially data-driven algorithms, has emerged as a promising solution to accelerate scans and enhance imaging performance using highly under-sampled data. Nevertheless, the scarcity of publicly available cardiac k-space datasets and evaluation platform hinder the development of data-driven reconstruction algorithms. To address this issue, we organized the Cardiac MRI Reconstruction Challenge (CMRxRecon) in 2023, in collaboration with the 26th International Conference on MICCAI. CMRxRecon presented an extensive k-space dataset comprising cine and mapping raw data, accompanied by detailed annotations of cardiac anatomical structures. With overwhelming participation, the challenge attracted more than 285 teams and over 600 participants. Among them, 22 teams successfully submitted Docker containers for the testing phase, with 7 teams submitted for both cine and mapping tasks. All teams use deep learning based approaches, indicating that deep learning has predominately become a promising solution for the problem. The first-place winner of both tasks utilizes the E2E-VarNet architecture as backbones. In contrast, U-Net is still the most popular backbone for both multi-coil and single-coil reconstructions. This paper provides a comprehensive overview of the challenge design, presents a summary of the submitted results, reviews the employed methods, and offers an in-depth discussion that aims to inspire future advancements in cardiac MRI reconstruction models. The summary emphasizes the effective strategies observed in Cardiac MRI reconstruction, including backbone architecture, loss function, pre-processing techniques, physical modeling, and model complexity, thereby providing valuable insights for further developments in this field.
△ Less
Submitted 16 April, 2024; v1 submitted 1 April, 2024;
originally announced April 2024.
-
LKFormer: Large Kernel Transformer for Infrared Image Super-Resolution
Authors:
Feiwei Qin,
Kang Yan,
Changmiao Wang,
Ruiquan Ge,
Yong Peng,
Kai Zhang
Abstract:
Given the broad application of infrared technology across diverse fields, there is an increasing emphasis on investigating super-resolution techniques for infrared images within the realm of deep learning. Despite the impressive results of current Transformer-based methods in image super-resolution tasks, their reliance on the self-attentive mechanism intrinsic to the Transformer architecture resu…
▽ More
Given the broad application of infrared technology across diverse fields, there is an increasing emphasis on investigating super-resolution techniques for infrared images within the realm of deep learning. Despite the impressive results of current Transformer-based methods in image super-resolution tasks, their reliance on the self-attentive mechanism intrinsic to the Transformer architecture results in images being treated as one-dimensional sequences, thereby neglecting their inherent two-dimensional structure. Moreover, infrared images exhibit a uniform pixel distribution and a limited gradient range, posing challenges for the model to capture effective feature information. Consequently, we suggest a potent Transformer model, termed Large Kernel Transformer (LKFormer), to address this issue. Specifically, we have designed a Large Kernel Residual Attention (LKRA) module with linear complexity. This mainly employs depth-wise convolution with large kernels to execute non-local feature modeling, thereby substituting the standard self-attentive layer. Additionally, we have devised a novel feed-forward network structure called Gated-Pixel Feed-Forward Network (GPFN) to augment the LKFormer's capacity to manage the information flow within the network. Comprehensive experimental results reveal that our method surpasses the most advanced techniques available, using fewer parameters and yielding considerably superior performance.The source code will be available at https://github.com/sad192/large-kernel-Transformer.
△ Less
Submitted 24 January, 2024; v1 submitted 22 January, 2024;
originally announced January 2024.
-
Liver Tumor Screening and Diagnosis in CT with Pixel-Lesion-Patient Network
Authors:
Ke Yan,
Xiaoli Yin,
Yingda Xia,
Fakai Wang,
Shu Wang,
Yuan Gao,
Jiawen Yao,
Chunli Li,
Xiaoyu Bai,
Jingren Zhou,
Ling Zhang,
Le Lu,
Yu Shi
Abstract:
Liver tumor segmentation and classification are important tasks in computer aided diagnosis. We aim to address three problems: liver tumor screening and preliminary diagnosis in non-contrast computed tomography (CT), and differential diagnosis in dynamic contrast-enhanced CT. A novel framework named Pixel-Lesion-pAtient Network (PLAN) is proposed. It uses a mask transformer to jointly segment and…
▽ More
Liver tumor segmentation and classification are important tasks in computer aided diagnosis. We aim to address three problems: liver tumor screening and preliminary diagnosis in non-contrast computed tomography (CT), and differential diagnosis in dynamic contrast-enhanced CT. A novel framework named Pixel-Lesion-pAtient Network (PLAN) is proposed. It uses a mask transformer to jointly segment and classify each lesion with improved anchor queries and a foreground-enhanced sampling loss. It also has an image-wise classifier to effectively aggregate global information and predict patient-level diagnosis. A large-scale multi-phase dataset is collected containing 939 tumor patients and 810 normal subjects. 4010 tumor instances of eight types are extensively annotated. On the non-contrast tumor screening task, PLAN achieves 95% and 96% in patient-level sensitivity and specificity. On contrast-enhanced CT, our lesion-level detection precision, recall, and classification accuracy are 92%, 89%, and 86%, outperforming widely used CNN and transformers for lesion segmentation. We also conduct a reader study on a holdout set of 250 cases. PLAN is on par with a senior human radiologist, showing the clinical significance of our results.
△ Less
Submitted 21 October, 2023; v1 submitted 17 July, 2023;
originally announced July 2023.
-
KU-DMIS-MSRA at RadSum23: Pre-trained Vision-Language Model for Radiology Report Summarization
Authors:
Gangwoo Kim,
Hajung Kim,
Lei Ji,
Seongsu Bae,
Chanhwi Kim,
Mujeen Sung,
Hyunjae Kim,
Kun Yan,
Eric Chang,
Jaewoo Kang
Abstract:
In this paper, we introduce CheXOFA, a new pre-trained vision-language model (VLM) for the chest X-ray domain. Our model is initially pre-trained on various multimodal datasets within the general domain before being transferred to the chest X-ray domain. Following a prominent VLM, we unify various domain-specific tasks into a simple sequence-to-sequence schema. It enables the model to effectively…
▽ More
In this paper, we introduce CheXOFA, a new pre-trained vision-language model (VLM) for the chest X-ray domain. Our model is initially pre-trained on various multimodal datasets within the general domain before being transferred to the chest X-ray domain. Following a prominent VLM, we unify various domain-specific tasks into a simple sequence-to-sequence schema. It enables the model to effectively learn the required knowledge and skills from limited resources in the domain. Demonstrating superior performance on the benchmark datasets provided by the BioNLP shared task, our model benefits from its training across multiple tasks and domains. With subtle techniques including ensemble and factual calibration, our system achieves first place on the RadSum23 leaderboard for the hidden test set.
△ Less
Submitted 10 July, 2023;
originally announced July 2023.
-
A Cascaded Approach for ultraly High Performance Lesion Detection and False Positive Removal in Liver CT Scans
Authors:
Fakai Wang,
Chi-Tung Cheng,
Chien-Wei Peng,
Ke Yan,
Min Wu,
Le Lu,
Chien-Hung Liao,
Ling Zhang
Abstract:
Liver cancer has high morbidity and mortality rates in the world. Multi-phase CT is a main medical imaging modality for detecting/identifying and diagnosing liver tumors. Automatically detecting and classifying liver lesions in CT images have the potential to improve the clinical workflow. This task remains challenging due to liver lesions' large variations in size, appearance, image contrast, and…
▽ More
Liver cancer has high morbidity and mortality rates in the world. Multi-phase CT is a main medical imaging modality for detecting/identifying and diagnosing liver tumors. Automatically detecting and classifying liver lesions in CT images have the potential to improve the clinical workflow. This task remains challenging due to liver lesions' large variations in size, appearance, image contrast, and the complexities of tumor types or subtypes. In this work, we customize a multi-object labeling tool for multi-phase CT images, which is used to curate a large-scale dataset containing 1,631 patients with four-phase CT images, multi-organ masks, and multi-lesion (six major types of liver lesions confirmed by pathology) masks. We develop a two-stage liver lesion detection pipeline, where the high-sensitivity detecting algorithms in the first stage discover as many lesion proposals as possible, and the lesion-reclassification algorithms in the second stage remove as many false alarms as possible. The multi-sensitivity lesion detection algorithm maximizes the information utilization of the individual probability maps of segmentation, and the lesion-shuffle augmentation effectively explores the texture contrast between lesions and the liver. Independently tested on 331 patient cases, the proposed model achieves high sensitivity and specificity for malignancy classification in the multi-phase contrast-enhanced CT (99.2%, 97.1%, diagnosis setting) and in the noncontrast CT (97.3%, 95.7%, screening setting).
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
CancerUniT: Towards a Single Unified Model for Effective Detection, Segmentation, and Diagnosis of Eight Major Cancers Using a Large Collection of CT Scans
Authors:
Jieneng Chen,
Yingda Xia,
Jiawen Yao,
Ke Yan,
Jianpeng Zhang,
Le Lu,
Fakai Wang,
Bo Zhou,
Mingyan Qiu,
Qihang Yu,
Mingze Yuan,
Wei Fang,
Yuxing Tang,
Minfeng Xu,
Jian Zhou,
Yuqian Zhao,
Qifeng Wang,
Xianghua Ye,
Xiaoli Yin,
Yu Shi,
Xin Chen,
Jingren Zhou,
Alan Yuille,
Zaiyi Liu,
Ling Zhang
Abstract:
Human readers or radiologists routinely perform full-body multi-organ multi-disease detection and diagnosis in clinical practice, while most medical AI systems are built to focus on single organs with a narrow list of a few diseases. This might severely limit AI's clinical adoption. A certain number of AI models need to be assembled non-trivially to match the diagnostic process of a human reading…
▽ More
Human readers or radiologists routinely perform full-body multi-organ multi-disease detection and diagnosis in clinical practice, while most medical AI systems are built to focus on single organs with a narrow list of a few diseases. This might severely limit AI's clinical adoption. A certain number of AI models need to be assembled non-trivially to match the diagnostic process of a human reading a CT scan. In this paper, we construct a Unified Tumor Transformer (CancerUniT) model to jointly detect tumor existence & location and diagnose tumor characteristics for eight major cancers in CT scans. CancerUniT is a query-based Mask Transformer model with the output of multi-tumor prediction. We decouple the object queries into organ queries, tumor detection queries and tumor diagnosis queries, and further establish hierarchical relationships among the three groups. This clinically-inspired architecture effectively assists inter- and intra-organ representation learning of tumors and facilitates the resolution of these complex, anatomically related multi-organ cancer image reading tasks. CancerUniT is trained end-to-end using a curated large-scale CT images of 10,042 patients including eight major types of cancers and occurring non-cancer tumors (all are pathology-confirmed with 3D tumor masks annotated by radiologists). On the test set of 631 patients, CancerUniT has demonstrated strong performance under a set of clinically relevant evaluation metrics, substantially outperforming both multi-disease methods and an assembly of eight single-organ expert models in tumor detection, segmentation, and diagnosis. This moves one step closer towards a universal high performance cancer screening tool.
△ Less
Submitted 6 October, 2023; v1 submitted 28 January, 2023;
originally announced January 2023.
-
Panchromatic and Multispectral Image Fusion via Alternating Reverse Filtering Network
Authors:
Keyu Yan,
Man Zhou,
Jie Huang,
Feng Zhao,
Chengjun Xie,
Chongyi Li,
Danfeng Hong
Abstract:
Panchromatic (PAN) and multi-spectral (MS) image fusion, named Pan-sharpening, refers to super-resolve the low-resolution (LR) multi-spectral (MS) images in the spatial domain to generate the expected high-resolution (HR) MS images, conditioning on the corresponding high-resolution PAN images. In this paper, we present a simple yet effective \textit{alternating reverse filtering network} for pan-s…
▽ More
Panchromatic (PAN) and multi-spectral (MS) image fusion, named Pan-sharpening, refers to super-resolve the low-resolution (LR) multi-spectral (MS) images in the spatial domain to generate the expected high-resolution (HR) MS images, conditioning on the corresponding high-resolution PAN images. In this paper, we present a simple yet effective \textit{alternating reverse filtering network} for pan-sharpening. Inspired by the classical reverse filtering that reverses images to the status before filtering, we formulate pan-sharpening as an alternately iterative reverse filtering process, which fuses LR MS and HR MS in an interpretable manner. Different from existing model-driven methods that require well-designed priors and degradation assumptions, the reverse filtering process avoids the dependency on pre-defined exact priors. To guarantee the stability and convergence of the iterative process via contraction mapping on a metric space, we develop the learnable multi-scale Gaussian kernel module, instead of using specific filters. We demonstrate the theoretical feasibility of such formulations. Extensive experiments on diverse scenes to thoroughly verify the performance of our method, significantly outperforming the state of the arts.
△ Less
Submitted 14 October, 2022;
originally announced October 2022.
-
Feedforward Control in the Presence of Input Nonlinearities: A Learning-based Approach
Authors:
Jilles van Hulst,
Maurice Poot,
Dragan Kostić,
Kai Wa Yan,
Jim Portegies,
Tom Oomen
Abstract:
Advanced feedforward control methods enable mechatronic systems to perform varying motion tasks with extreme accuracy and throughput. The aim of this paper is to develop a data-driven feedforward controller that addresses input nonlinearities, which are common in typical applications such as semiconductor back-end equipment. The developed method consists of parametric inverse-model feedforward tha…
▽ More
Advanced feedforward control methods enable mechatronic systems to perform varying motion tasks with extreme accuracy and throughput. The aim of this paper is to develop a data-driven feedforward controller that addresses input nonlinearities, which are common in typical applications such as semiconductor back-end equipment. The developed method consists of parametric inverse-model feedforward that is optimized for tracking error reduction by exploiting ideas from iterative learning control. Results on a simulated set-up indicate improved performance over existing identification methods for systems with nonlinearities at the input.
△ Less
Submitted 23 September, 2022;
originally announced September 2022.
-
A New Probabilistic V-Net Model with Hierarchical Spatial Feature Transform for Efficient Abdominal Multi-Organ Segmentation
Authors:
Minfeng Xu,
Heng Guo,
Jianfeng Zhang,
Ke Yan,
Le Lu
Abstract:
Accurate and robust abdominal multi-organ segmentation from CT imaging of different modalities is a challenging task due to complex inter- and intra-organ shape and appearance variations among abdominal organs. In this paper, we propose a probabilistic multi-organ segmentation network with hierarchical spatial-wise feature modulation to capture flexible organ semantic variants and inject the learn…
▽ More
Accurate and robust abdominal multi-organ segmentation from CT imaging of different modalities is a challenging task due to complex inter- and intra-organ shape and appearance variations among abdominal organs. In this paper, we propose a probabilistic multi-organ segmentation network with hierarchical spatial-wise feature modulation to capture flexible organ semantic variants and inject the learnt variants into different scales of feature maps for guiding segmentation. More specifically, we design an input decomposition module via a conditional variational auto-encoder to learn organ-specific distributions on the low dimensional latent space and model richer organ semantic variations that is conditioned on input images.Then by integrating these learned variations into the V-Net decoder hierarchically via spatial feature transformation, which has the ability to convert the variations into conditional Affine transformation parameters for spatial-wise feature maps modulating and guiding the fine-scale segmentation. The proposed method is trained on the publicly available AbdomenCT-1K dataset and evaluated on two other open datasets, i.e., 100 challenging/pathological testing patient cases from AbdomenCT-1K fully-supervised abdominal organ segmentation benchmark and 90 cases from TCIA+&BTCV dataset. Highly competitive or superior quantitative segmentation results have been achieved using these datasets for four abdominal organs of liver, kidney, spleen and pancreas with reported Dice scores improved by 7.3% for kidneys and 9.7% for pancreas, while being ~7 times faster than two strong baseline segmentation methods(nnUNet and CoTr).
△ Less
Submitted 2 August, 2022;
originally announced August 2022.
-
Improving the estimation of directional area scattering factor (DASF) from canopy reflectance: theoretical basis and validation
Authors:
Yi Lin,
Siyuan Liu,
Lei Yan,
Kai Yan,
Yelu Zeng,
Bin Yang
Abstract:
Directional area scattering factor (DASF) is a critical canopy structural parameter for vegetation monitoring. It provides an efficient tool for decoupling of canopy structure and leaf optics from canopy reflectance. Current standard approach to estimate DASF from canopy bidirectional reflectance factor (BRF) is based on the assumption that in the weakly absorbing 710 to 790 nm spectral interval,…
▽ More
Directional area scattering factor (DASF) is a critical canopy structural parameter for vegetation monitoring. It provides an efficient tool for decoupling of canopy structure and leaf optics from canopy reflectance. Current standard approach to estimate DASF from canopy bidirectional reflectance factor (BRF) is based on the assumption that in the weakly absorbing 710 to 790 nm spectral interval, leaf scattering does not change much with the concentration of dry matter and thus its variation can be neglected. This results in biased estimates of DASF and consequently leads to uncertainty in DASF-related applications. This study proposes a new approach to account for variations in concentrations of this biochemical constituent, which additionally uses the canopy BRF at 2260 nm. In silico analysis of the proposed approach suggests significant increase in accuracy over the standard technique by a relative root mean square error (rRMSE) of 49% and 34% for one- and three dimensional scenes, respectively. When compared with indoor multi-angular hyperspectral measurements reported in literature, the mean absolute error has reduced by 68% for needle leaf and 20% for broadleaf canopies. Thus, the proposed DASF estimation approach outperforms the current one and can be used more reliably in DASF-related applications, such as vegetation monitoring of functional traits, dynamics, and radiation budget.
△ Less
Submitted 27 April, 2022;
originally announced April 2022.
-
Memory-augmented Deep Unfolding Network for Guided Image Super-resolution
Authors:
Man Zhou,
Keyu Yan,
Jinshan Pan,
Wenqi Ren,
Qi Xie,
Xiangyong Cao
Abstract:
Guided image super-resolution (GISR) aims to obtain a high-resolution (HR) target image by enhancing the spatial resolution of a low-resolution (LR) target image under the guidance of a HR image. However, previous model-based methods mainly takes the entire image as a whole, and assume the prior distribution between the HR target image and the HR guidance image, simply ignoring many non-local comm…
▽ More
Guided image super-resolution (GISR) aims to obtain a high-resolution (HR) target image by enhancing the spatial resolution of a low-resolution (LR) target image under the guidance of a HR image. However, previous model-based methods mainly takes the entire image as a whole, and assume the prior distribution between the HR target image and the HR guidance image, simply ignoring many non-local common characteristics between them. To alleviate this issue, we firstly propose a maximal a posterior (MAP) estimation model for GISR with two types of prior on the HR target image, i.e., local implicit prior and global implicit prior. The local implicit prior aims to model the complex relationship between the HR target image and the HR guidance image from a local perspective, and the global implicit prior considers the non-local auto-regression property between the two images from a global perspective. Secondly, we design a novel alternating optimization algorithm to solve this model for GISR. The algorithm is in a concise framework that facilitates to be replicated into commonly used deep network structures. Thirdly, to reduce the information loss across iterative stages, the persistent memory mechanism is introduced to augment the information representation by exploiting the Long short-term memory unit (LSTM) in the image and feature spaces. In this way, a deep network with certain interpretation and high representation ability is built. Extensive experimental results validate the superiority of our method on a variety of GISR tasks, including Pan-sharpening, depth image super-resolution, and MR image super-resolution.
△ Less
Submitted 12 February, 2022;
originally announced March 2022.
-
The Brain Tumor Sequence Registration (BraTS-Reg) Challenge: Establishing Correspondence Between Pre-Operative and Follow-up MRI Scans of Diffuse Glioma Patients
Authors:
Bhakti Baheti,
Satrajit Chakrabarty,
Hamed Akbari,
Michel Bilello,
Benedikt Wiestler,
Julian Schwarting,
Evan Calabrese,
Jeffrey Rudie,
Syed Abidi,
Mina Mousa,
Javier Villanueva-Meyer,
Brandon K. K. Fields,
Florian Kofler,
Russell Takeshi Shinohara,
Juan Eugenio Iglesias,
Tony C. W. Mok,
Albert C. S. Chung,
Marek Wodzinski,
Artur Jurgas,
Niccolo Marini,
Manfredo Atzori,
Henning Muller,
Christoph Grobroehmer,
Hanna Siebert,
Lasse Hansen
, et al. (48 additional authors not shown)
Abstract:
Registration of longitudinal brain MRI scans containing pathologies is challenging due to dramatic changes in tissue appearance. Although there has been progress in developing general-purpose medical image registration techniques, they have not yet attained the requisite precision and reliability for this task, highlighting its inherent complexity. Here we describe the Brain Tumor Sequence Registr…
▽ More
Registration of longitudinal brain MRI scans containing pathologies is challenging due to dramatic changes in tissue appearance. Although there has been progress in developing general-purpose medical image registration techniques, they have not yet attained the requisite precision and reliability for this task, highlighting its inherent complexity. Here we describe the Brain Tumor Sequence Registration (BraTS-Reg) challenge, as the first public benchmark environment for deformable registration algorithms focusing on estimating correspondences between pre-operative and follow-up scans of the same patient diagnosed with a diffuse brain glioma. The BraTS-Reg data comprise de-identified multi-institutional multi-parametric MRI (mpMRI) scans, curated for size and resolution according to a canonical anatomical template, and divided into training, validation, and testing sets. Clinical experts annotated ground truth (GT) landmark points of anatomical locations distinct across the temporal domain. Quantitative evaluation and ranking were based on the Median Euclidean Error (MEE), Robustness, and the determinant of the Jacobian of the displacement field. The top-ranked methodologies yielded similar performance across all evaluation metrics and shared several methodological commonalities, including pre-alignment, deep neural networks, inverse consistency analysis, and test-time instance optimization per-case basis as a post-processing step. The top-ranked method attained the MEE at or below that of the inter-rater variability for approximately 60% of the evaluated landmarks, underscoring the scope for further accuracy and robustness improvements, especially relative to human experts. The aim of BraTS-Reg is to continue to serve as an active resource for research, with the data and online evaluation tools accessible at https://bratsreg.github.io/.
△ Less
Submitted 17 April, 2024; v1 submitted 13 December, 2021;
originally announced December 2021.
-
Accurate and Generalizable Quantitative Scoring of Liver Steatosis from Ultrasound Images via Scalable Deep Learning
Authors:
Bowen Li,
Dar-In Tai,
Ke Yan,
Yi-Cheng Chen,
Shiu-Feng Huang,
Tse-Hwa Hsu,
Wan-Ting Yu,
Jing Xiao,
Le Lu,
Adam P. Harrison
Abstract:
Background & Aims: Hepatic steatosis is a major cause of chronic liver disease. 2D ultrasound is the most widely used non-invasive tool for screening and monitoring, but associated diagnoses are highly subjective. We developed a scalable deep learning (DL) algorithm for quantitative scoring of liver steatosis from 2D ultrasound images.
Approach & Results: Using retrospectively collected multi-vi…
▽ More
Background & Aims: Hepatic steatosis is a major cause of chronic liver disease. 2D ultrasound is the most widely used non-invasive tool for screening and monitoring, but associated diagnoses are highly subjective. We developed a scalable deep learning (DL) algorithm for quantitative scoring of liver steatosis from 2D ultrasound images.
Approach & Results: Using retrospectively collected multi-view ultrasound data from 3,310 patients, 19,513 studies, and 228,075 images, we trained a DL algorithm to diagnose steatosis stages (healthy, mild, moderate, or severe) from ultrasound diagnoses. Performance was validated on two multi-scanner unblinded and blinded (initially to DL developer) histology-proven cohorts (147 and 112 patients) with histopathology fatty cell percentage diagnoses, and a subset with FibroScan diagnoses. We also quantified reliability across scanners and viewpoints. Results were evaluated using Bland-Altman and receiver operating characteristic (ROC) analysis. The DL algorithm demonstrates repeatable measurements with a moderate number of images (3 for each viewpoint) and high agreement across 3 premium ultrasound scanners. High diagnostic performance was observed across all viewpoints: area under the curves of the ROC to classify >=mild, >=moderate, =severe steatosis grades were 0.85, 0.90, and 0.93, respectively. The DL algorithm outperformed or performed at least comparably to FibroScan with statistically significant improvements for all levels on the unblinded histology-proven cohort, and for =severe steatosis on the blinded histology-proven cohort.
Conclusions: The DL algorithm provides a reliable quantitative steatosis assessment across view and scanners on two multi-scanner cohorts. Diagnostic performance was high with comparable or better performance than FibroScan.
△ Less
Submitted 11 October, 2021;
originally announced October 2021.
-
SAME: Deformable Image Registration based on Self-supervised Anatomical Embeddings
Authors:
Fengze Liu,
Ke Yan,
Adam Harrison,
Dazhou Guo,
Le Lu,
Alan Yuille,
Lingyun Huang,
Guotong Xie,
Jing Xiao,
Xianghua Ye,
Dakai Jin
Abstract:
In this work, we introduce a fast and accurate method for unsupervised 3D medical image registration. This work is built on top of a recent algorithm SAM, which is capable of computing dense anatomical/semantic correspondences between two images at the pixel level. Our method is named SAME, which breaks down image registration into three steps: affine transformation, coarse deformation, and deep d…
▽ More
In this work, we introduce a fast and accurate method for unsupervised 3D medical image registration. This work is built on top of a recent algorithm SAM, which is capable of computing dense anatomical/semantic correspondences between two images at the pixel level. Our method is named SAME, which breaks down image registration into three steps: affine transformation, coarse deformation, and deep deformable registration. Using SAM embeddings, we enhance these steps by finding more coherent correspondences, and providing features and a loss function with better semantic guidance. We collect a multi-phase chest computed tomography dataset with 35 annotated organs for each patient and conduct inter-subject registration for quantitative evaluation. Results show that SAME outperforms widely-used traditional registration techniques (Elastix FFD, ANTs SyN) and learning based VoxelMorph method by at least 4.7% and 2.7% in Dice scores for two separate tasks of within-contrast-phase and across-contrast-phase registration, respectively. SAME achieves the comparable performance to the best traditional registration method, DEEDS (from our evaluation), while being orders of magnitude faster (from 45 seconds to 1.2 seconds).
△ Less
Submitted 23 September, 2021;
originally announced September 2021.
-
Lesion Segmentation and RECIST Diameter Prediction via Click-driven Attention and Dual-path Connection
Authors:
Youbao Tang,
Ke Yan,
Jinzheng Cai,
Lingyun Huang,
Guotong Xie,
Jing Xiao,
Jingjing Lu,
Gigin Lin,
Le Lu
Abstract:
Measuring lesion size is an important step to assess tumor growth and monitor disease progression and therapy response in oncology image analysis. Although it is tedious and highly time-consuming, radiologists have to work on this task by using RECIST criteria (Response Evaluation Criteria In Solid Tumors) routinely and manually. Even though lesion segmentation may be the more accurate and clinica…
▽ More
Measuring lesion size is an important step to assess tumor growth and monitor disease progression and therapy response in oncology image analysis. Although it is tedious and highly time-consuming, radiologists have to work on this task by using RECIST criteria (Response Evaluation Criteria In Solid Tumors) routinely and manually. Even though lesion segmentation may be the more accurate and clinically more valuable means, physicians can not manually segment lesions as now since much more heavy laboring will be required. In this paper, we present a prior-guided dual-path network (PDNet) to segment common types of lesions throughout the whole body and predict their RECIST diameters accurately and automatically. Similar to [1], a click guidance from radiologists is the only requirement. There are two key characteristics in PDNet: 1) Learning lesion-specific attention matrices in parallel from the click prior information by the proposed prior encoder, named click-driven attention; 2) Aggregating the extracted multi-scale features comprehensively by introducing top-down and bottom-up connections in the proposed decoder, named dual-path connection. Experiments show the superiority of our proposed PDNet in lesion segmentation and RECIST diameter prediction using the DeepLesion dataset and an external test set. PDNet learns comprehensive and representative deep image features for our tasks and produces more accurate results on both lesion segmentation and RECIST diameter prediction.
△ Less
Submitted 4 May, 2021;
originally announced May 2021.
-
Weakly-Supervised Universal Lesion Segmentation with Regional Level Set Loss
Authors:
Youbao Tang,
Jinzheng Cai,
Ke Yan,
Lingyun Huang,
Guotong Xie,
Jing Xiao,
Jingjing Lu,
Gigin Lin,
Le Lu
Abstract:
Accurately segmenting a variety of clinically significant lesions from whole body computed tomography (CT) scans is a critical task on precision oncology imaging, denoted as universal lesion segmentation (ULS). Manual annotation is the current clinical practice, being highly time-consuming and inconsistent on tumor's longitudinal assessment. Effectively training an automatic segmentation model is…
▽ More
Accurately segmenting a variety of clinically significant lesions from whole body computed tomography (CT) scans is a critical task on precision oncology imaging, denoted as universal lesion segmentation (ULS). Manual annotation is the current clinical practice, being highly time-consuming and inconsistent on tumor's longitudinal assessment. Effectively training an automatic segmentation model is desirable but relies heavily on a large number of pixel-wise labelled data. Existing weakly-supervised segmentation approaches often struggle with regions nearby the lesion boundaries. In this paper, we present a novel weakly-supervised universal lesion segmentation method by building an attention enhanced model based on the High-Resolution Network (HRNet), named AHRNet, and propose a regional level set (RLS) loss for optimizing lesion boundary delineation. AHRNet provides advanced high-resolution deep image features by involving a decoder, dual-attention and scale attention mechanisms, which are crucial to performing accurate lesion segmentation. RLS can optimize the model reliably and effectively in a weakly-supervised fashion, forcing the segmentation close to lesion boundary. Extensive experimental results demonstrate that our method achieves the best performance on the publicly large-scale DeepLesion dataset and a hold-out test set.
△ Less
Submitted 3 May, 2021;
originally announced May 2021.
-
The Optimal Location and Size of an Intermediate Coil in a Magnetic Resonant Coupling Wireless Power Transfer System
Authors:
Kedi Yan,
Gregory E. Moore,
Joshua R. Smith
Abstract:
To increase the transmission distance of Wireless Power Transfer (WPT) systems, we provide guidelines on choosing the optimal location of an Intermediate Coil with respect to size within a standard five-coil axially aligned experimental setup. From our results, for maximum magnitude of S21 at the resonant frequency we found the optimal location to exist where the coupling coefficient between the T…
▽ More
To increase the transmission distance of Wireless Power Transfer (WPT) systems, we provide guidelines on choosing the optimal location of an Intermediate Coil with respect to size within a standard five-coil axially aligned experimental setup. From our results, for maximum magnitude of S21 at the resonant frequency we found the optimal location to exist where the coupling coefficient between the Transmitter and the Intermediate Coil and the coupling coefficient between the Receiver and the Intermediate Coil are identical. Additionally, the optimal outer diameter for the maximum magnitude of S21 at the resonant frequency of the Intermediate Coil in the given symmetric and asymmetric setup are found to be larger than both TX and RX.
△ Less
Submitted 30 November, 2020;
originally announced December 2020.
-
Research on Intelligent Charging System Technology of Automobile Group
Authors:
Kedi Yan
Abstract:
This paper analyzes the smart charging system for dealing with issues related to large parking garages, and analyzes the relevant technical standards of intelligent charging piles application and comprehensive transportation hubs. It mainly includes the number, area and charging method of the charging piles installed in the garages. New forms of construction management is conceived, and the invest…
▽ More
This paper analyzes the smart charging system for dealing with issues related to large parking garages, and analyzes the relevant technical standards of intelligent charging piles application and comprehensive transportation hubs. It mainly includes the number, area and charging method of the charging piles installed in the garages. New forms of construction management is conceived, and the investment and construction are reinforced, apportioning all charging system property rights to investment parties. Simultaneously, the investors take full responsibility for the future management and operation, mainly including the collection of service fees and charging fees.
△ Less
Submitted 30 November, 2020;
originally announced December 2020.
-
Microscope Based HER2 Scoring System
Authors:
Jun Zhang,
Kuan Tian,
Pei Dong,
Haocheng Shen,
Kezhou Yan,
Jianhua Yao,
Junzhou Huang,
Xiao Han
Abstract:
The overexpression of human epidermal growth factor receptor 2 (HER2) has been established as a therapeutic target in multiple types of cancers, such as breast and gastric cancers. Immunohistochemistry (IHC) is employed as a basic HER2 test to identify the HER2-positive, borderline, and HER2-negative patients. However, the reliability and accuracy of HER2 scoring are affected by many factors, such…
▽ More
The overexpression of human epidermal growth factor receptor 2 (HER2) has been established as a therapeutic target in multiple types of cancers, such as breast and gastric cancers. Immunohistochemistry (IHC) is employed as a basic HER2 test to identify the HER2-positive, borderline, and HER2-negative patients. However, the reliability and accuracy of HER2 scoring are affected by many factors, such as pathologists' experience. Recently, artificial intelligence (AI) has been used in various disease diagnosis to improve diagnostic accuracy and reliability, but the interpretation of diagnosis results is still an open problem. In this paper, we propose a real-time HER2 scoring system, which follows the HER2 scoring guidelines to complete the diagnosis, and thus each step is explainable. Unlike the previous scoring systems based on whole-slide imaging, our HER2 scoring system is integrated into an augmented reality (AR) microscope that can feedback AI results to the pathologists while reading the slide. The pathologists can help select informative fields of view (FOVs), avoiding the confounding regions, such as DCIS. Importantly, we illustrate the intermediate results with membrane staining condition and cell classification results, making it possible to evaluate the reliability of the diagnostic results. Also, we support the interactive modification of selecting regions-of-interest, making our system more flexible in clinical practice. The collaboration of AI and pathologists can significantly improve the robustness of our system. We evaluate our system with 285 breast IHC HER2 slides, and the classification accuracy of 95\% shows the effectiveness of our HER2 scoring system.
△ Less
Submitted 14 September, 2020;
originally announced September 2020.
-
Lymph Node Gross Tumor Volume Detection and Segmentation via Distance-based Gating using 3D CT/PET Imaging in Radiotherapy
Authors:
Zhuotun Zhu,
Dakai Jin,
Ke Yan,
Tsung-Ying Ho,
Xianghua Ye,
Dazhou Guo,
Chun-Hung Chao,
Jing Xiao,
Alan Yuille,
Le Lu
Abstract:
Finding, identifying and segmenting suspicious cancer metastasized lymph nodes from 3D multi-modality imaging is a clinical task of paramount importance. In radiotherapy, they are referred to as Lymph Node Gross Tumor Volume (GTVLN). Determining and delineating the spread of GTVLN is essential in defining the corresponding resection and irradiating regions for the downstream workflows of surgical…
▽ More
Finding, identifying and segmenting suspicious cancer metastasized lymph nodes from 3D multi-modality imaging is a clinical task of paramount importance. In radiotherapy, they are referred to as Lymph Node Gross Tumor Volume (GTVLN). Determining and delineating the spread of GTVLN is essential in defining the corresponding resection and irradiating regions for the downstream workflows of surgical resection and radiotherapy of various cancers. In this work, we propose an effective distance-based gating approach to simulate and simplify the high-level reasoning protocols conducted by radiation oncologists, in a divide-and-conquer manner. GTVLN is divided into two subgroups of tumor-proximal and tumor-distal, respectively, by means of binary or soft distance gating. This is motivated by the observation that each category can have distinct though overlapping distributions of appearance, size and other LN characteristics. A novel multi-branch detection-by-segmentation network is trained with each branch specializing on learning one GTVLN category features, and outputs from multi-branch are fused in inference. The proposed method is evaluated on an in-house dataset of $141$ esophageal cancer patients with both PET and CT imaging modalities. Our results validate significant improvements on the mean recall from $72.5\%$ to $78.2\%$, as compared to previous state-of-the-art work. The highest achieved GTVLN recall of $82.5\%$ at $20\%$ precision is clinically relevant and valuable since human observers tend to have low sensitivity (around $80\%$ for the most experienced radiation oncologists, as reported by literature).
△ Less
Submitted 26 August, 2020;
originally announced August 2020.
-
Reliable Liver Fibrosis Assessment from Ultrasound using Global Hetero-Image Fusion and View-Specific Parameterization
Authors:
Bowen Li,
Ke Yan,
Dar-In Tai,
Yuankai Huo,
Le Lu,
Jing Xiao,
Adam P. Harrison
Abstract:
Ultrasound (US) is a critical modality for diagnosing liver fibrosis. Unfortunately, assessment is very subjective, motivating automated approaches. We introduce a principled deep convolutional neural network (CNN) workflow that incorporates several innovations. First, to avoid overfitting on non-relevant image features, we force the network to focus on a clinical region of interest (ROI), encompa…
▽ More
Ultrasound (US) is a critical modality for diagnosing liver fibrosis. Unfortunately, assessment is very subjective, motivating automated approaches. We introduce a principled deep convolutional neural network (CNN) workflow that incorporates several innovations. First, to avoid overfitting on non-relevant image features, we force the network to focus on a clinical region of interest (ROI), encompassing the liver parenchyma and upper border. Second, we introduce global heteroimage fusion (GHIF), which allows the CNN to fuse features from any arbitrary number of images in a study, increasing its versatility and flexibility. Finally, we use 'style'-based view-specific parameterization (VSP) to tailor the CNN processing for different viewpoints of the liver, while keeping the majority of parameters the same across views. Experiments on a dataset of 610 patient studies (6979 images) demonstrate that our pipeline can contribute roughly 7% and 22% improvements in partial area under the curve and recall at 90% precision, respectively, over conventional classifiers, validating our approach to this crucial problem.
△ Less
Submitted 7 August, 2020;
originally announced August 2020.
-
One Click Lesion RECIST Measurement and Segmentation on CT Scans
Authors:
Youbao Tang,
Ke Yan,
Jing Xiao,
Ranold M. Summers
Abstract:
In clinical trials, one of the radiologists' routine work is to measure tumor sizes on medical images using the RECIST criteria (Response Evaluation Criteria In Solid Tumors). However, manual measurement is tedious and subject to inter-observer variability. We propose a unified framework named SEENet for semi-automatic lesion \textit{SE}gmentation and RECIST \textit{E}stimation on a variety of les…
▽ More
In clinical trials, one of the radiologists' routine work is to measure tumor sizes on medical images using the RECIST criteria (Response Evaluation Criteria In Solid Tumors). However, manual measurement is tedious and subject to inter-observer variability. We propose a unified framework named SEENet for semi-automatic lesion \textit{SE}gmentation and RECIST \textit{E}stimation on a variety of lesions over the entire human body. The user is only required to provide simple guidance by clicking once near the lesion. SEENet consists of two main parts. The first one extracts the lesion of interest with the one-click guidance, roughly segments the lesion, and estimates its RECIST measurement. Based on the results of the first network, the second one refines the lesion segmentation and RECIST estimation. SEENet achieves state-of-the-art performance in lesion segmentation and RECIST estimation on the large-scale public DeepLesion dataset. It offers a practical tool for radiologists to generate reliable lesion measurements (i.e. segmentation mask and RECIST) with minimal human effort and greatly reduced time.
△ Less
Submitted 21 July, 2020;
originally announced July 2020.
-
Gate Decorator: Global Filter Pruning Method for Accelerating Deep Convolutional Neural Networks
Authors:
Zhonghui You,
Kun Yan,
Jinmian Ye,
Meng Ma,
Ping Wang
Abstract:
Filter pruning is one of the most effective ways to accelerate and compress convolutional neural networks (CNNs). In this work, we propose a global filter pruning algorithm called Gate Decorator, which transforms a vanilla CNN module by multiplying its output by the channel-wise scaling factors, i.e. gate. When the scaling factor is set to zero, it is equivalent to removing the corresponding filte…
▽ More
Filter pruning is one of the most effective ways to accelerate and compress convolutional neural networks (CNNs). In this work, we propose a global filter pruning algorithm called Gate Decorator, which transforms a vanilla CNN module by multiplying its output by the channel-wise scaling factors, i.e. gate. When the scaling factor is set to zero, it is equivalent to removing the corresponding filter. We use Taylor expansion to estimate the change in the loss function caused by setting the scaling factor to zero and use the estimation for the global filter importance ranking. Then we prune the network by removing those unimportant filters. After pruning, we merge all the scaling factors into its original module, so no special operations or structures are introduced. Moreover, we propose an iterative pruning framework called Tick-Tock to improve pruning accuracy. The extensive experiments demonstrate the effectiveness of our approaches. For example, we achieve the state-of-the-art pruning ratio on ResNet-56 by reducing 70% FLOPs without noticeable loss in accuracy. For ResNet-50 on ImageNet, our pruned model with 40% FLOPs reduction outperforms the baseline model by 0.31% in top-1 accuracy. Various datasets are used, including CIFAR-10, CIFAR-100, CUB-200, ImageNet ILSVRC-12 and PASCAL VOC 2011. Code is available at github.com/youzhonghui/gate-decorator-pruning
△ Less
Submitted 17 September, 2019;
originally announced September 2019.