-
CISOL: An Open and Extensible Dataset for Table Structure Recognition in the Construction Industry
Authors:
David Tschirschwitz,
Volker Rodehorst
Abstract:
Reproducibility and replicability are critical pillars of empirical research, particularly in machine learning, where they depend not only on the availability of models, but also on the datasets used to train and evaluate those models. In this paper, we introduce the Construction Industry Steel Ordering List (CISOL) dataset, which was developed with a focus on transparency to ensure reproducibilit…
▽ More
Reproducibility and replicability are critical pillars of empirical research, particularly in machine learning, where they depend not only on the availability of models, but also on the datasets used to train and evaluate those models. In this paper, we introduce the Construction Industry Steel Ordering List (CISOL) dataset, which was developed with a focus on transparency to ensure reproducibility, replicability, and extensibility. CISOL provides a valuable new research resource and highlights the importance of having diverse datasets, even in niche application domains such as table extraction in civil engineering.
CISOL is unique in that it contains real-world civil engineering documents from industry, making it a distinctive contribution to the field. The dataset contains more than 120,000 annotated instances in over 800 document images, positioning it as a medium-sized dataset that provides a robust foundation for Table Structure Recognition (TSR) and Table Detection (TD) tasks.
Benchmarking results show that CISOL achieves 67.22 [email protected]:0.95:0.05 using the YOLOv8 model, outperforming the TSR-specific TATR model. This highlights the effectiveness of CISOL as a benchmark for advancing TSR, especially in specialized domains.
△ Less
Submitted 26 January, 2025;
originally announced January 2025.
-
Efficient and Discriminative Image Feature Extraction for Universal Image Retrieval
Authors:
Morris Florek,
David Tschirschwitz,
Björn Barz,
Volker Rodehorst
Abstract:
Current image retrieval systems often face domain specificity and generalization issues. This study aims to overcome these limitations by developing a computationally efficient training framework for a universal feature extractor that provides strong semantic image representations across various domains. To this end, we curated a multi-domain training dataset, called M4D-35k, which allows for reso…
▽ More
Current image retrieval systems often face domain specificity and generalization issues. This study aims to overcome these limitations by developing a computationally efficient training framework for a universal feature extractor that provides strong semantic image representations across various domains. To this end, we curated a multi-domain training dataset, called M4D-35k, which allows for resource-efficient training. Additionally, we conduct an extensive evaluation and comparison of various state-of-the-art visual-semantic foundation models and margin-based metric learning loss functions regarding their suitability for efficient universal feature extraction. Despite constrained computational resources, we achieve near state-of-the-art results on the Google Universal Image Embedding Challenge, with a mMP@5 of 0.721. This places our method at the second rank on the leaderboard, just 0.7 percentage points behind the best performing method. However, our model has 32% fewer overall parameters and 289 times fewer trainable parameters. Compared to methods with similar computational requirements, we outperform the previous state of the art by 3.3 percentage points. We release our code and M4D-35k training set annotations at https://github.com/morrisfl/UniFEx.
△ Less
Submitted 20 September, 2024;
originally announced September 2024.
-
Label Convergence: Defining an Upper Performance Bound in Object Recognition through Contradictory Annotations
Authors:
David Tschirschwitz,
Volker Rodehorst
Abstract:
Annotation errors are a challenge not only during training of machine learning models, but also during their evaluation. Label variations and inaccuracies in datasets often manifest as contradictory examples that deviate from established labeling conventions. Such inconsistencies, when significant, prevent models from achieving optimal performance on metrics such as mean Average Precision (mAP). W…
▽ More
Annotation errors are a challenge not only during training of machine learning models, but also during their evaluation. Label variations and inaccuracies in datasets often manifest as contradictory examples that deviate from established labeling conventions. Such inconsistencies, when significant, prevent models from achieving optimal performance on metrics such as mean Average Precision (mAP). We introduce the notion of "label convergence" to describe the highest achievable performance under the constraint of contradictory test annotations, essentially defining an upper bound on model accuracy.
Recognizing that noise is an inherent characteristic of all data, our study analyzes five real-world datasets, including the LVIS dataset, to investigate the phenomenon of label convergence. We approximate that label convergence is between 62.63-67.52 mAP@[0.5:0.95:0.05] for LVIS with 95% confidence, attributing these bounds to the presence of real annotation errors. With current state-of-the-art (SOTA) models at the upper end of the label convergence interval for the well-studied LVIS dataset, we conclude that model capacity is sufficient to solve current object detection problems. Therefore, future efforts should focus on three key aspects: (1) updating the problem specification and adjusting evaluation practices to account for unavoidable label noise, (2) creating cleaner data, especially test data, and (3) including multi-annotated data to investigate annotation variation and make these issues visible from the outset.
△ Less
Submitted 21 January, 2025; v1 submitted 14 September, 2024;
originally announced September 2024.
-
ENSTRECT: A Stage-based Approach to 2.5D Structural Damage Detection
Authors:
Christian Benz,
Volker Rodehorst
Abstract:
To effectively assess structural damage, it is essential to localize the instances of damage in the physical world of a civil structure. ENSTRECT is a stage-based approach designed to accomplish 2.5D structural damage detection. The method requires an image collection, the relative orientation, and a point cloud. Using these inputs, surface damages are segmented at the image level and then mapped…
▽ More
To effectively assess structural damage, it is essential to localize the instances of damage in the physical world of a civil structure. ENSTRECT is a stage-based approach designed to accomplish 2.5D structural damage detection. The method requires an image collection, the relative orientation, and a point cloud. Using these inputs, surface damages are segmented at the image level and then mapped into the point cloud space, resulting in a segmented point cloud. To enable further quantitative analyses, the segmented point cloud is transformed into measurable damage instances: cracks are extracted by contracting the clustered point cloud into a corresponding medial axis. For areal damages, such as spalling and corrosion, a procedure is proposed to compute the bounding polygon based on PCA and alpha shapes. With a localization tolerance of 4cm, ENSTRECT can achieve IoUs of over 90% for cracks, 82% for corrosion, and 41% for spalling. Detection at the instance level yields an AP50 of about 45% (cracks, spalling) and 56% (corrosion).
△ Less
Submitted 2 October, 2024; v1 submitted 6 January, 2024;
originally announced January 2024.
-
Drawing the Same Bounding Box Twice? Coping Noisy Annotations in Object Detection with Repeated Labels
Authors:
David Tschirschwitz,
Christian Benz,
Morris Florek,
Henrik Norderhus,
Benno Stein,
Volker Rodehorst
Abstract:
The reliability of supervised machine learning systems depends on the accuracy and availability of ground truth labels. However, the process of human annotation, being prone to error, introduces the potential for noisy labels, which can impede the practicality of these systems. While training with noisy labels is a significant consideration, the reliability of test data is also crucial to ascertai…
▽ More
The reliability of supervised machine learning systems depends on the accuracy and availability of ground truth labels. However, the process of human annotation, being prone to error, introduces the potential for noisy labels, which can impede the practicality of these systems. While training with noisy labels is a significant consideration, the reliability of test data is also crucial to ascertain the dependability of the results. A common approach to addressing this issue is repeated labeling, where multiple annotators label the same example, and their labels are combined to provide a better estimate of the true label. In this paper, we propose a novel localization algorithm that adapts well-established ground truth estimation methods for object detection and instance segmentation tasks. The key innovation of our method lies in its ability to transform combined localization and classification tasks into classification-only problems, thus enabling the application of techniques such as Expectation-Maximization (EM) or Majority Voting (MJV). Although our main focus is the aggregation of unique ground truth for test data, our algorithm also shows superior performance during training on the TexBiG dataset, surpassing both noisy label training and label aggregation using Weighted Boxes Fusion (WBF). Our experiments indicate that the benefits of repeated labels emerge under specific dataset and annotation configurations. The key factors appear to be (1) dataset complexity, the (2) annotator consistency, and (3) the given annotation budget constraints.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.