-
ModernBERT is More Efficient than Conventional BERT for Chest CT Findings Classification in Japanese Radiology Reports
Authors:
Yosuke Yamagishi,
Tomohiro Kikuchi,
Shouhei Hanaoka,
Takeharu Yoshikawa,
Osamu Abe
Abstract:
Objective: This study aims to evaluate and compare the performance of two Japanese language models-conventional Bidirectional Encoder Representations from Transformers (BERT) and the newer ModernBERT-in classifying findings from chest CT reports, with a focus on tokenization efficiency, processing time, and classification performance. Methods: We conducted a retrospective study using the CT-RATE-J…
▽ More
Objective: This study aims to evaluate and compare the performance of two Japanese language models-conventional Bidirectional Encoder Representations from Transformers (BERT) and the newer ModernBERT-in classifying findings from chest CT reports, with a focus on tokenization efficiency, processing time, and classification performance. Methods: We conducted a retrospective study using the CT-RATE-JPN dataset containing 22,778 training reports and 150 test reports. Both models were fine-tuned for multi-label classification of 18 common chest CT conditions. The training data was split in 18,222:4,556 for training and validation. Performance was evaluated using F1 scores for each condition and exact match accuracy across all 18 labels. Results: ModernBERT demonstrated superior tokenization efficiency, requiring 24.0% fewer tokens per document (258.1 vs. 339.6) compared to BERT Base. This translated to significant performance improvements, with ModernBERT completing training in 1877.67 seconds versus BERT's 3090.54 seconds (39% reduction). ModernBERT processed 38.82 samples per second during training (1.65x faster) and 139.90 samples per second during inference (1.66x faster). Despite these efficiency gains, classification performance remained comparable, with ModernBERT achieving superior F1 scores in 8 conditions, while BERT performed better in 4 conditions. Overall exact match accuracy was slightly higher for ModernBERT (74.67% vs. 72.67%), though this difference was not statistically significant (p=0.6291). Conclusion: ModernBERT offers substantial improvements in tokenization efficiency and training speed without sacrificing classification performance. These results suggest that ModernBERT is a promising candidate for clinical applications in Japanese radiology reports analysis.
△ Less
Submitted 6 March, 2025;
originally announced March 2025.
-
Development of a Large-scale Dataset of Chest Computed Tomography Reports in Japanese and a High-performance Finding Classification Model
Authors:
Yosuke Yamagishi,
Yuta Nakamura,
Tomohiro Kikuchi,
Yuki Sonoda,
Hiroshi Hirakawa,
Shintaro Kano,
Satoshi Nakamura,
Shouhei Hanaoka,
Takeharu Yoshikawa,
Osamu Abe
Abstract:
Background: Recent advances in large language models highlight the need for high-quality multilingual medical datasets. While Japan leads globally in CT scanner deployment and utilization, the lack of large-scale Japanese radiology datasets has hindered the development of specialized language models for medical imaging analysis. Objective: To develop a comprehensive Japanese CT report dataset thro…
▽ More
Background: Recent advances in large language models highlight the need for high-quality multilingual medical datasets. While Japan leads globally in CT scanner deployment and utilization, the lack of large-scale Japanese radiology datasets has hindered the development of specialized language models for medical imaging analysis. Objective: To develop a comprehensive Japanese CT report dataset through machine translation and establish a specialized language model for structured finding classification. Additionally, to create a rigorously validated evaluation dataset through expert radiologist review. Methods: We translated the CT-RATE dataset (24,283 CT reports from 21,304 patients) into Japanese using GPT-4o mini. The training dataset consisted of 22,778 machine-translated reports, while the validation dataset included 150 radiologist-revised reports. We developed CT-BERT-JPN based on "tohoku-nlp/bert-base-japanese-v3" architecture for extracting 18 structured findings from Japanese radiology reports. Results: Translation metrics showed strong performance with BLEU scores of 0.731 and 0.690, and ROUGE scores ranging from 0.770 to 0.876 for Findings and from 0.748 to 0.857 for Impression sections. CT-BERT-JPN demonstrated superior performance compared to GPT-4o in 11 out of 18 conditions, including lymphadenopathy (+14.2%), interlobular septal thickening (+10.9%), and atelectasis (+7.4%). The model maintained F1 scores exceeding 0.95 in 14 out of 18 conditions and achieved perfect scores in four conditions. Conclusions: Our study establishes a robust Japanese CT report dataset and demonstrates the effectiveness of a specialized language model for structured finding classification. The hybrid approach of machine translation and expert validation enables the creation of large-scale medical datasets while maintaining high quality.
△ Less
Submitted 20 December, 2024;
originally announced December 2024.
-
Zero-shot 3D Segmentation of Abdominal Organs in CT Scans Using Segment Anything Model 2: Adapting Video Tracking Capabilities for 3D Medical Imaging
Authors:
Yosuke Yamagishi,
Shouhei Hanaoka,
Tomohiro Kikuchi,
Takahiro Nakao,
Yuta Nakamura,
Yukihiro Nomura,
Soichiro Miki,
Takeharu Yoshikawa,
Osamu Abe
Abstract:
Objectives: To evaluate the zero-shot performance of Segment Anything Model 2 (SAM 2) in 3D segmentation of abdominal organs in CT scans, and to investigate the effects of prompt settings on segmentation results.
Materials and Methods: In this retrospective study, we used a subset of the TotalSegmentator CT dataset from eight institutions to assess SAM 2's ability to segment eight abdominal orga…
▽ More
Objectives: To evaluate the zero-shot performance of Segment Anything Model 2 (SAM 2) in 3D segmentation of abdominal organs in CT scans, and to investigate the effects of prompt settings on segmentation results.
Materials and Methods: In this retrospective study, we used a subset of the TotalSegmentator CT dataset from eight institutions to assess SAM 2's ability to segment eight abdominal organs. Segmentation was initiated from three different z-coordinate levels (caudal, mid, and cranial levels) of each organ. Performance was measured using the Dice similarity coefficient (DSC). We also analyzed the impact of "negative prompts," which explicitly exclude certain regions from the segmentation process, on accuracy.
Results: 123 patients (mean age, 60.7 \pm 15.5 years; 63 men, 60 women) were evaluated. As a zero-shot approach, larger organs with clear boundaries demonstrated high segmentation performance, with mean DSCs as follows: liver 0.821 \pm 0.192, right kidney 0.862 \pm 0.212, left kidney 0.870 \pm 0.154, and spleen 0.891 \pm 0.131. Smaller organs showed lower performance: gallbladder 0.531 \pm 0.291, pancreas 0.361 \pm 0.197, and adrenal glands, right 0.203 \pm 0.222, left 0.308 \pm 0.234. The initial slice for segmentation and the use of negative prompts significantly influenced the results. By removing negative prompts from the input, the DSCs significantly decreased for six organs.
Conclusion: SAM 2 demonstrated promising zero-shot performance in segmenting certain abdominal organs in CT scans, particularly larger organs. Performance was significantly influenced by input negative prompts and initial slice selection, highlighting the importance of optimizing these factors.
△ Less
Submitted 13 January, 2025; v1 submitted 12 August, 2024;
originally announced August 2024.
-
Local Differential Privacy Image Generation Using Flow-based Deep Generative Models
Authors:
Hisaichi Shibata,
Shouhei Hanaoka,
Yang Cao,
Masatoshi Yoshikawa,
Tomomi Takenaga,
Yukihiro Nomura,
Naoto Hayashi,
Osamu Abe
Abstract:
Diagnostic radiologists need artificial intelligence (AI) for medical imaging, but access to medical images required for training in AI has become increasingly restrictive. To release and use medical images, we need an algorithm that can simultaneously protect privacy and preserve pathologies in medical images. To develop such an algorithm, here, we propose DP-GLOW, a hybrid of a local differentia…
▽ More
Diagnostic radiologists need artificial intelligence (AI) for medical imaging, but access to medical images required for training in AI has become increasingly restrictive. To release and use medical images, we need an algorithm that can simultaneously protect privacy and preserve pathologies in medical images. To develop such an algorithm, here, we propose DP-GLOW, a hybrid of a local differential privacy (LDP) algorithm and one of the flow-based deep generative models (GLOW). By applying a GLOW model, we disentangle the pixelwise correlation of images, which makes it difficult to protect privacy with straightforward LDP algorithms for images. Specifically, we map images onto the latent vector of the GLOW model, each element of which follows an independent normal distribution, and we apply the Laplace mechanism to the latent vector. Moreover, we applied DP-GLOW to chest X-ray images to generate LDP images while preserving pathologies.
△ Less
Submitted 20 December, 2022;
originally announced December 2022.
-
Aging prediction using deep generative model toward the development of preventive medicine
Authors:
Hisaichi Shibata,
Shouhei Hanaoka,
Yukihiro Nomura,
Naoto Hayashi,
Osamu Abe
Abstract:
From birth to death, we all experience surprisingly ubiquitous changes over time due to aging. If we can predict aging in the digital domain, that is, the digital twin of the human body, we would be able to detect lesions in their very early stages, thereby enhancing the quality of life and extending the life span. We observed that none of the previously developed digital twins of the adult human…
▽ More
From birth to death, we all experience surprisingly ubiquitous changes over time due to aging. If we can predict aging in the digital domain, that is, the digital twin of the human body, we would be able to detect lesions in their very early stages, thereby enhancing the quality of life and extending the life span. We observed that none of the previously developed digital twins of the adult human body explicitly trained longitudinal conversion rules between volumetric medical images with deep generative models, potentially resulting in poor prediction performance of, for example, ventricular volumes. Here, we establish a new digital twin of an adult human body that adopts longitudinally acquired head computed tomography (CT) images for training, enabling prediction of future volumetric head CT images from a single present volumetric head CT image. We, for the first time, adopt one of the three-dimensional flow-based deep generative models to realize this sequential three-dimensional digital twin. We show that our digital twin outperforms the latest methods of prediction of ventricular volumes in relatively short terms.
△ Less
Submitted 23 August, 2022;
originally announced August 2022.
-
X2CT-FLOW: Maximum a posteriori reconstruction using a progressive flow-based deep generative model for ultra sparse-view computed tomography in ultra low-dose protocols
Authors:
Hisaichi Shibata,
Shouhei Hanaoka,
Yukihiro Nomura,
Takahiro Nakao,
Tomomi Takenaga,
Naoto Hayashi,
Osamu Abe
Abstract:
Ultra sparse-view computed tomography (CT) algorithms can reduce radiation exposure of patients, but those algorithms lack an explicit cycle consistency loss minimization and an explicit log-likelihood maximization in testing. Here, we propose X2CT-FLOW for the maximum a posteriori (MAP) reconstruction of a three-dimensional (3D) chest CT image from a single or a few two-dimensional (2D) projectio…
▽ More
Ultra sparse-view computed tomography (CT) algorithms can reduce radiation exposure of patients, but those algorithms lack an explicit cycle consistency loss minimization and an explicit log-likelihood maximization in testing. Here, we propose X2CT-FLOW for the maximum a posteriori (MAP) reconstruction of a three-dimensional (3D) chest CT image from a single or a few two-dimensional (2D) projection images using a progressive flow-based deep generative model, especially for ultra low-dose protocols. The MAP reconstruction can simultaneously optimize the cycle consistency loss and the log-likelihood. The proposed algorithm is built upon a newly developed progressive flow-based deep generative model, which is featured with exact log-likelihood estimation, efficient sampling, and progressive learning. We applied X2CT-FLOW to reconstruction of 3D chest CT images from biplanar projection images without noise contamination (assuming a standard-dose protocol) and with strong noise contamination (assuming an ultra low-dose protocol). With the standard-dose protocol, our images reconstructed from 2D projected images and 3D ground-truth CT images showed good agreement in terms of structural similarity (SSIM, 0.7675 on average), peak signal-to-noise ratio (PSNR, 25.89 dB on average), mean absolute error (MAE, 0.02364 on average), and normalized root mean square error (NRMSE, 0.05731 on average). Moreover, with the ultra low-dose protocol, our images reconstructed from 2D projected images and the 3D ground-truth CT images also showed good agreement in terms of SSIM (0.7008 on average), PSNR (23.58 dB on average), MAE (0.02991 on average), and NRMSE (0.07349 on average).
△ Less
Submitted 30 September, 2021; v1 submitted 8 April, 2021;
originally announced April 2021.
-
KART: Parameterization of Privacy Leakage Scenarios from Pre-trained Language Models
Authors:
Yuta Nakamura,
Shouhei Hanaoka,
Yukihiro Nomura,
Naoto Hayashi,
Osamu Abe,
Shuntaro Yada,
Shoko Wakamiya,
Eiji Aramaki
Abstract:
For the safe sharing pre-trained language models, no guidelines exist at present owing to the difficulty in estimating the upper bound of the risk of privacy leakage. One problem is that previous studies have assessed the risk for different real-world privacy leakage scenarios and attack methods, which reduces the portability of the findings. To tackle this problem, we represent complex real-world…
▽ More
For the safe sharing pre-trained language models, no guidelines exist at present owing to the difficulty in estimating the upper bound of the risk of privacy leakage. One problem is that previous studies have assessed the risk for different real-world privacy leakage scenarios and attack methods, which reduces the portability of the findings. To tackle this problem, we represent complex real-world privacy leakage scenarios under a universal parameterization, \textit{Knowledge, Anonymization, Resource, and Target} (KART). KART parameterization has two merits: (i) it clarifies the definition of privacy leakage in each experiment and (ii) it improves the comparability of the findings of risk assessments. We show that previous studies can be simply reviewed by parameterizing the scenarios with KART. We also demonstrate privacy risk assessments in different scenarios under the same attack method, which suggests that KART helps approximate the upper bound of risk under a specific attack or scenario. We believe that KART helps integrate past and future findings on privacy risk and will contribute to a standard for sharing language models.
△ Less
Submitted 17 March, 2022; v1 submitted 31 December, 2020;
originally announced January 2021.
-
On the Matrix-Free Generation of Adversarial Perturbations for Black-Box Attacks
Authors:
Hisaichi Shibata,
Shouhei Hanaoka,
Yukihiro Nomura,
Naoto Hayashi,
Osamu Abe
Abstract:
In general, adversarial perturbations superimposed on inputs are realistic threats for a deep neural network (DNN). In this paper, we propose a practical generation method of such adversarial perturbation to be applied to black-box attacks that demand access to an input-output relationship only. Thus, the attackers generate such perturbation without invoking inner functions and/or accessing the in…
▽ More
In general, adversarial perturbations superimposed on inputs are realistic threats for a deep neural network (DNN). In this paper, we propose a practical generation method of such adversarial perturbation to be applied to black-box attacks that demand access to an input-output relationship only. Thus, the attackers generate such perturbation without invoking inner functions and/or accessing the inner states of a DNN. Unlike the earlier studies, the algorithm to generate the perturbation presented in this study requires much fewer query trials. Moreover, to show the effectiveness of the adversarial perturbation extracted, we experiment with a DNN for semantic segmentation. The result shows that the network is easily deceived with the perturbation generated than using uniformly distributed random noise with the same magnitude.
△ Less
Submitted 17 February, 2020;
originally announced February 2020.
-
A versatile anomaly detection method for medical images with a flow-based generative model in semi-supervision setting
Authors:
H. Shibata,
S. Hanaoka,
Y. Nomura,
T. Nakao,
I. Sato,
D. Sato,
N. Hayashi,
O. Abe
Abstract:
Oversight in medical images is a crucial problem, and timely reporting of medical images is desired. Therefore, an all-purpose anomaly detection method that can detect virtually all types of lesions/diseases in a given image is strongly desired. However, few commercially available and versatile anomaly detection methods for medical images have been provided so far. Recently, anomaly detection meth…
▽ More
Oversight in medical images is a crucial problem, and timely reporting of medical images is desired. Therefore, an all-purpose anomaly detection method that can detect virtually all types of lesions/diseases in a given image is strongly desired. However, few commercially available and versatile anomaly detection methods for medical images have been provided so far. Recently, anomaly detection methods built upon deep learning methods have been rapidly growing in popularity, and these methods seem to provide reasonable solutions to the problem. However, the workload to label the images necessary for training in deep learning remains heavy. In this study, we present an anomaly detection method based on two trained flow-based generative models. With this method, the posterior probability can be computed as a normality metric for any given image. The training of the generative models requires two sets of images: a set containing only normal images and another set containing both normal and abnormal images without any labels. In the latter set, each sample does not have to be labeled as normal or abnormal; therefore, any mixture of images (e.g., all cases in a hospital) can be used as the dataset without cumbersome manual labeling. The method was validated with two types of medical images: chest X-ray radiographs (CXRs) and brain computed tomographies (BCTs). The areas under the receiver operating characteristic curves for logarithm posterior probabilities of CXRs (0.868 for pneumonia-like opacities) and BCTs (0.904 for infarction) were comparable to those in previous studies with other anomaly detection methods. This result showed the versatility of our method.
△ Less
Submitted 20 October, 2020; v1 submitted 21 January, 2020;
originally announced January 2020.
-
A Data as a Service (DaaS) Model for GPU-based Data Analytics
Authors:
John Olorunfemi Abe,
Burak Berk Ustundaug
Abstract:
Cloud-based services with resources to be provisioned for consumers are increasingly the norm, especially with respect to Big data, spatiotemporal data mining and application services that impose a user's agreed Quality of Service (QoS) rules or Service Level Agreement (SLA). Considering the pervasive nature of data centers and cloud system, there is a need for a real-time analytics of the systems…
▽ More
Cloud-based services with resources to be provisioned for consumers are increasingly the norm, especially with respect to Big data, spatiotemporal data mining and application services that impose a user's agreed Quality of Service (QoS) rules or Service Level Agreement (SLA). Considering the pervasive nature of data centers and cloud system, there is a need for a real-time analytics of the systems considering cost, utility and energy. This work presents an overlay model of GPU system for Data As A Service (DaaS) to give a real-time data analysis of network data, customers, investors and users' data from the datacenters or cloud system. Using a modeled layer to define a learning protocol and system, we give a custom, profitable system for DaaS on GPU. The GPU-enabled pre-processing and initial operations of the clustering model analysis is promising as shown in the results. We examine the model on real-world data sets to model a big data set or spatiotemporal data mining services. We also produce results of our model with clustering, neural networks' Self-organizing feature maps (SOFM or SOM) to produce a distribution of the clustering for DaaS model. The experimental results thus far show a promising model that could enhance SLA and or QoS based DaaS.
△ Less
Submitted 5 February, 2018;
originally announced February 2018.