Automated Report Generation for Lung Cytological Images Using a CNN Vision Classifier and Multiple-Transformer Text Decoders: Preliminary Study

Teramoto, Atsushi; Michiba, Ayano; Kiriyama, Yuka; Tsukamoto, Tetsuya; Imaizumi, Kazuyoshi; Fujita, Hiroshi

Electrical Engineering and Systems Science > Image and Video Processing

arXiv:2403.18151 (eess)

[Submitted on 26 Mar 2024]

Title:Automated Report Generation for Lung Cytological Images Using a CNN Vision Classifier and Multiple-Transformer Text Decoders: Preliminary Study

Authors:Atsushi Teramoto, Ayano Michiba, Yuka Kiriyama, Tetsuya Tsukamoto, Kazuyoshi Imaizumi, Hiroshi Fujita

View PDF

Abstract:Cytology plays a crucial role in lung cancer diagnosis. Pulmonary cytology involves cell morphological characterization in the specimen and reporting the corresponding findings, which are extremely burdensome tasks. In this study, we propose a report-generation technique for lung cytology images. In total, 71 benign and 135 malignant pulmonary cytology specimens were collected. Patch images were extracted from the captured specimen images, and the findings were assigned to each image as a dataset for report generation. The proposed method consists of a vision model and a text decoder. In the former, a convolutional neural network (CNN) is used to classify a given image as benign or malignant, and the features related to the image are extracted from the intermediate layer. Independent text decoders for benign and malignant cells are prepared for text generation, and the text decoder switches according to the CNN classification results. The text decoder is configured using a Transformer that uses the features obtained from the CNN for report generation. Based on the evaluation results, the sensitivity and specificity were 100% and 96.4%, respectively, for automated benign and malignant case classification, and the saliency map indicated characteristic benign and malignant areas. The grammar and style of the generated texts were confirmed as correct and in better agreement with gold standard compared to existing LLM-based image-captioning methods and single-text-decoder ablation model. These results indicate that the proposed method is useful for pulmonary cytology classification and reporting.

Comments:	This work has been submitted to the IEEE for possible publication
Subjects:	Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)
Cite as:	arXiv:2403.18151 [eess.IV]
	(or arXiv:2403.18151v1 [eess.IV] for this version)
	https://doi.org/10.48550/arXiv.2403.18151

Submission history

From: Atsushi Teramoto [view email]
[v1] Tue, 26 Mar 2024 23:32:29 UTC (695 KB)

Electrical Engineering and Systems Science > Image and Video Processing

Title:Automated Report Generation for Lung Cytological Images Using a CNN Vision Classifier and Multiple-Transformer Text Decoders: Preliminary Study

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Image and Video Processing

Title:Automated Report Generation for Lung Cytological Images Using a CNN Vision Classifier and Multiple-Transformer Text Decoders: Preliminary Study

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators