Search | arXiv e-print repository

arXiv:2504.20220 [pdf, other]

A Multimodal Pipeline for Clinical Data Extraction: Applying Vision-Language Models to Scans of Transfusion Reaction Reports

Authors: Henning Schäfer, Cynthia S. Schmidt, Johannes Wutzkowsky, Kamil Lorek, Lea Reinartz, Johannes Rückert, Christian Temme, Britta Böckmann, Peter A. Horn, Christoph M. Friedrich

Abstract: Despite the growing adoption of electronic health records, many processes still rely on paper documents, reflecting the heterogeneous real-world conditions in which healthcare is delivered. The manual transcription process is time-consuming and prone to errors when transferring paper-based data to digital formats. To streamline this workflow, this study presents an open-source pipeline that extrac… ▽ More Despite the growing adoption of electronic health records, many processes still rely on paper documents, reflecting the heterogeneous real-world conditions in which healthcare is delivered. The manual transcription process is time-consuming and prone to errors when transferring paper-based data to digital formats. To streamline this workflow, this study presents an open-source pipeline that extracts and categorizes checkbox data from scanned documents. Demonstrated on transfusion reaction reports, the design supports adaptation to other checkbox-rich document types. The proposed method integrates checkbox detection, multilingual optical character recognition (OCR) and multilingual vision-language models (VLMs). The pipeline achieves high precision and recall compared against annually compiled gold-standards from 2017 to 2024. The result is a reduction in administrative workload and accurate regulatory reporting. The open-source availability of this pipeline encourages self-hosted parsing of checkbox forms. △ Less

Submitted 28 April, 2025; originally announced April 2025.

MSC Class: 68T07 ACM Class: I.7.5; I.4.7; I.2.7; H.3.3; J.3

arXiv:2405.10004 [pdf, other]

doi 10.1038/s41597-024-03496-6

ROCOv2: Radiology Objects in COntext Version 2, an Updated Multimodal Image Dataset

Authors: Johannes Rückert, Louise Bloch, Raphael Brüngel, Ahmad Idrissi-Yaghir, Henning Schäfer, Cynthia S. Schmidt, Sven Koitka, Obioma Pelka, Asma Ben Abacha, Alba G. Seco de Herrera, Henning Müller, Peter A. Horn, Felix Nensa, Christoph M. Friedrich

Abstract: Automated medical image analysis systems often require large amounts of training data with high quality labels, which are difficult and time consuming to generate. This paper introduces Radiology Object in COntext version 2 (ROCOv2), a multimodal dataset consisting of radiological images and associated medical concepts and captions extracted from the PMC Open Access subset. It is an updated versio… ▽ More Automated medical image analysis systems often require large amounts of training data with high quality labels, which are difficult and time consuming to generate. This paper introduces Radiology Object in COntext version 2 (ROCOv2), a multimodal dataset consisting of radiological images and associated medical concepts and captions extracted from the PMC Open Access subset. It is an updated version of the ROCO dataset published in 2018, and adds 35,705 new images added to PMC since 2018. It further provides manually curated concepts for imaging modalities with additional anatomical and directional concepts for X-rays. The dataset consists of 79,789 images and has been used, with minor modifications, in the concept detection and caption prediction tasks of ImageCLEFmedical Caption 2023. The dataset is suitable for training image annotation models based on image-caption pairs, or for multi-label image classification using Unified Medical Language System (UMLS) concepts provided with each image. In addition, it can serve for pre-training of medical domain models, and evaluation of deep learning models for multi-task learning. △ Less

Submitted 18 June, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

Comments: Accepted for Scientific Data

arXiv:2309.01373 [pdf, other]

doi 10.1007/978-3-031-43849-3_5

PreprintResolver: Improving Citation Quality by Resolving Published Versions of ArXiv Preprints using Literature Databases

Authors: Louise Bloch, Johannes Rückert, Christoph M. Friedrich

Abstract: The growing impact of preprint servers enables the rapid sharing of time-sensitive research. Likewise, it is becoming increasingly difficult to distinguish high-quality, peer-reviewed research from preprints. Although preprints are often later published in peer-reviewed journals, this information is often missing from preprint servers. To overcome this problem, the PreprintResolver was developed,… ▽ More The growing impact of preprint servers enables the rapid sharing of time-sensitive research. Likewise, it is becoming increasingly difficult to distinguish high-quality, peer-reviewed research from preprints. Although preprints are often later published in peer-reviewed journals, this information is often missing from preprint servers. To overcome this problem, the PreprintResolver was developed, which uses four literature databases (DBLP, SemanticScholar, OpenAlex, and CrossRef / CrossCite) to identify preprint-publication pairs for the arXiv preprint server. The target audience focuses on, but is not limited to inexperienced researchers and students, especially from the field of computer science. The tool is based on a fuzzy matching of author surnames, titles, and DOIs. Experiments were performed on a sample of 1,000 arXiv-preprints from the research field of computer science and without any publication information. With 77.94 %, computer science is highly affected by missing publication information in arXiv. The results show that the PreprintResolver was able to resolve 603 out of 1,000 (60.3 %) arXiv-preprints from the research field of computer science and without any publication information. All four literature databases contributed to the final result. In a manual validation, a random sample of 100 resolved preprints was checked. For all preprints, at least one result is plausible. For nine preprints, more than one result was identified, three of which are partially invalid. In conclusion the PreprintResolver is suitable for individual, manually reviewed requests, but less suitable for bulk requests. The PreprintResolver tool (https://preprintresolver.eu, Available from 2023-08-01) and source code (https://gitlab.com/ippolis_wp3/preprint-resolver, Accessed: 2023-07-19) is available online. △ Less

Submitted 4 September, 2023; originally announced September 2023.

Comments: Accepted for International Conference on Theory and Practice of Digital Libraries (TPDL 2023)

arXiv:2010.03341 [pdf, other]

doi 10.1016/j.compbiomed.2021.104596

Deep Learning in Diabetic Foot Ulcers Detection: A Comprehensive Evaluation

Authors: Moi Hoon Yap, Ryo Hachiuma, Azadeh Alavi, Raphael Brungel, Bill Cassidy, Manu Goyal, Hongtao Zhu, Johannes Ruckert, Moshe Olshansky, Xiao Huang, Hideo Saito, Saeed Hassanpour, Christoph M. Friedrich, David Ascher, Anping Song, Hiroki Kajita, David Gillespie, Neil D. Reeves, Joseph Pappachan, Claire O'Shea, Eibe Frank

Abstract: There has been a substantial amount of research involving computer methods and technology for the detection and recognition of diabetic foot ulcers (DFUs), but there is a lack of systematic comparisons of state-of-the-art deep learning object detection frameworks applied to this problem. DFUC2020 provided participants with a comprehensive dataset consisting of 2,000 images for training and 2,000 i… ▽ More There has been a substantial amount of research involving computer methods and technology for the detection and recognition of diabetic foot ulcers (DFUs), but there is a lack of systematic comparisons of state-of-the-art deep learning object detection frameworks applied to this problem. DFUC2020 provided participants with a comprehensive dataset consisting of 2,000 images for training and 2,000 images for testing. This paper summarises the results of DFUC2020 by comparing the deep learning-based algorithms proposed by the winning teams: Faster R-CNN, three variants of Faster R-CNN and an ensemble method; YOLOv3; YOLOv5; EfficientDet; and a new Cascade Attention Network. For each deep learning method, we provide a detailed description of model architecture, parameter settings for training and additional stages including pre-processing, data augmentation and post-processing. We provide a comprehensive evaluation for each method. All the methods required a data augmentation stage to increase the number of images available for training and a post-processing stage to remove false positives. The best performance was obtained from Deformable Convolution, a variant of Faster R-CNN, with a mean average precision (mAP) of 0.6940 and an F1-Score of 0.7434. Finally, we demonstrate that the ensemble method based on different deep learning methods can enhanced the F1-Score but not the mAP. △ Less

Submitted 24 May, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

Comments: 19 pages, 18 figures, 10 tables

Journal ref: Computers in Biology and Medicine, Volume 135, 2021, 104596, ISSN 0010-4825,

arXiv:1812.11439 [pdf, other]

A Comprehensive Analysis of Swarming-based Live Streaming to Leverage Client Heterogeneity

Authors: Wasiur R. KhudaBukhsh, Julius Rückert, Julian Wulfheide, David Hausheer, Heinz Koeppl

Abstract: Due to missing IP multicast support on an Internet scale, over-the-top media streams are delivered with the help of overlays as used by content delivery networks and their peer-to-peer (P2P) extensions. In this context, mesh/pull-based swarming plays an important role either as pure streaming approach or in combination with tree/push mechanisms. However, the impact of realistic client populations… ▽ More Due to missing IP multicast support on an Internet scale, over-the-top media streams are delivered with the help of overlays as used by content delivery networks and their peer-to-peer (P2P) extensions. In this context, mesh/pull-based swarming plays an important role either as pure streaming approach or in combination with tree/push mechanisms. However, the impact of realistic client populations with heterogeneous resources is not yet fully understood. In this technical report, we contribute to closing this gap by mathematically analysing the most basic scheduling mechanisms latest deadline first (LDF) and earliest deadline first (EDF) in a continuous time Markov chain framework and combining them into a simple, yet powerful, mixed strategy to leverage inherent differences in client resources. The main contributions are twofold: (1) a mathematical framework for swarming on random graphs is proposed with a focus on LDF and EDF strategies in heterogeneous scenarios; (2) a mixed strategy, named SchedMix, is proposed that leverages peer heterogeneity. The proposed strategy, SchedMix is shown to outperform the other two strategies using different abstractions: a mean-field theoretic analysis of buffer probabilities, simulations of a stochastic model on random graphs, and a full-stack implementation of a P2P streaming system. △ Less

Submitted 29 December, 2018; originally announced December 2018.

Comments: Technical report and supplementary material to http://ieeexplore.ieee.org/document/7497234/

Showing 1–5 of 5 results for author: Rückert, J