Navigating the Synthetic Realm: Harnessing Diffusion-based Models for Laparoscopic Text-to-Image Generation

Allmendinger, Simeon; Hemmer, Patrick; Queisner, Moritz; Sauer, Igor; Müller, Leopold; Jakubik, Johannes; Vössing, Michael; Kühl, Niklas

Electrical Engineering and Systems Science > Image and Video Processing

arXiv:2312.03043 (eess)

[Submitted on 5 Dec 2023]

Title:Navigating the Synthetic Realm: Harnessing Diffusion-based Models for Laparoscopic Text-to-Image Generation

Authors:Simeon Allmendinger, Patrick Hemmer, Moritz Queisner, Igor Sauer, Leopold Müller, Johannes Jakubik, Michael Vössing, Niklas Kühl

View PDF HTML (experimental)

Abstract:Recent advances in synthetic imaging open up opportunities for obtaining additional data in the field of surgical imaging. This data can provide reliable supplements supporting surgical applications and decision-making through computer vision. Particularly the field of image-guided surgery, such as laparoscopic and robotic-assisted surgery, benefits strongly from synthetic image datasets and virtual surgical training methods. Our study presents an intuitive approach for generating synthetic laparoscopic images from short text prompts using diffusion-based generative models. We demonstrate the usage of state-of-the-art text-to-image architectures in the context of laparoscopic imaging with regard to the surgical removal of the gallbladder as an example. Results on fidelity and diversity demonstrate that diffusion-based models can acquire knowledge about the style and semantics in the field of image-guided surgery. A validation study with a human assessment survey underlines the realistic nature of our synthetic data, as medical personnel detects actual images in a pool with generated images causing a false-positive rate of 66%. In addition, the investigation of a state-of-the-art machine learning model to recognize surgical actions indicates enhanced results when trained with additional generated images of up to 5.20%. Overall, the achieved image quality contributes to the usage of computer-generated images in surgical applications and enhances its path to maturity.

Subjects:	Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Tissues and Organs (q-bio.TO)
Cite as:	arXiv:2312.03043 [eess.IV]
	(or arXiv:2312.03043v1 [eess.IV] for this version)
	https://doi.org/10.48550/arXiv.2312.03043

Submission history

From: Simeon Allmendinger [view email]
[v1] Tue, 5 Dec 2023 16:20:22 UTC (11,719 KB)

Electrical Engineering and Systems Science > Image and Video Processing

Title:Navigating the Synthetic Realm: Harnessing Diffusion-based Models for Laparoscopic Text-to-Image Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Image and Video Processing

Title:Navigating the Synthetic Realm: Harnessing Diffusion-based Models for Laparoscopic Text-to-Image Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators