-
Template-Based Schema Matching of Multi-Layout Tenancy Schedules:A Comparative Study of a Template-Based Hybrid Matcher and the ALITE Full Disjunction Model
Authors:
Tim Uilkema,
Yao Ma,
Seyed Sahand Mohammadi Ziabari,
Joep van Vliet
Abstract:
The lack of standardized tabular formats for tenancy schedules across real estate firms creates significant inefficiencies in data integration. Existing automated integration methods, such as Full Disjunction (FD)-based models like ALITE, prioritize completeness but result in schema bloat, sparse attributes and limited business usability. We propose a novel hybrid, template-based schema matcher th…
▽ More
The lack of standardized tabular formats for tenancy schedules across real estate firms creates significant inefficiencies in data integration. Existing automated integration methods, such as Full Disjunction (FD)-based models like ALITE, prioritize completeness but result in schema bloat, sparse attributes and limited business usability. We propose a novel hybrid, template-based schema matcher that aligns multi-layout tenancy schedules to a predefined target schema. The matcher combines schema (Jaccard, Levenshtein) and instance-based metrics (data types, distributions) with globally optimal assignments determined via the Hungarian Algorithm. Evaluation against a manually labeled ground truth demonstrates substantial improvements, with grid search optimization yielding a peak F1-score of 0.881 and an overall null percentage of 45.7%. On a separate ground truth of 20 semantically similar column sets, ALITE achieves an F1-score of 0.712 and 75.6% nulls. These results suggest that combining structured business knowledge with hybrid matching can yield more usable and business-aligned schema mappings. The approach assumes cleanly extracted tabular input, future work could explore extending the matcher to support complex, composite tables.
△ Less
Submitted 2 July, 2025;
originally announced July 2025.
-
Anomalous NO2 emitting ship detection with TROPOMI satellite data and machine learning
Authors:
Solomiia Kurchaba,
Jasper van Vliet,
Fons J. Verbeek,
Cor J. Veenman
Abstract:
Starting from 2021, more demanding $\text{NO}_\text{x}$ emission restrictions were introduced for ships operating in the North and Baltic Sea waters. Since all methods currently used for ship compliance monitoring are financially and time demanding, it is important to prioritize the inspection of ships that have high chances of being non-compliant. The current state-of-the-art approach for a large…
▽ More
Starting from 2021, more demanding $\text{NO}_\text{x}$ emission restrictions were introduced for ships operating in the North and Baltic Sea waters. Since all methods currently used for ship compliance monitoring are financially and time demanding, it is important to prioritize the inspection of ships that have high chances of being non-compliant. The current state-of-the-art approach for a large-scale ship $\text{NO}_\text{2}$ estimation is a supervised machine learning-based segmentation of ship plumes on TROPOMI/S5P images. However, challenging data annotation and insufficiently complex ship emission proxy used for the validation limit the applicability of the model for ship compliance monitoring. In this study, we present a method for the automated selection of potentially non-compliant ships using a combination of machine learning models on TROPOMI satellite data. It is based on a proposed regression model predicting the amount of $\text{NO}_\text{2}$ that is expected to be produced by a ship with certain properties operating in the given atmospheric conditions. The model does not require manual labeling and is validated with TROPOMI data directly. The differences between the predicted and actual amount of produced $\text{NO}_\text{2}$ are integrated over observations of the ship in time and are used as a measure of the inspection worthiness of a ship. To assure the robustness of the results, we compare the obtained results with the results of the previously developed segmentation-based method. Ships that are also highly deviating in accordance with the segmentation method require further attention. If no other explanations can be found by checking the TROPOMI data, the respective ships are advised to be the candidates for inspection.
△ Less
Submitted 7 April, 2023; v1 submitted 24 February, 2023;
originally announced February 2023.
-
Supervised segmentation of NO2 plumes from individual ships using TROPOMI satellite data
Authors:
Solomiia Kurchaba,
Jasper van Vliet,
Fons J. Verbeek,
Jacqueline J. Meulman,
Cor J. Veenman
Abstract:
The shipping industry is one of the strongest anthropogenic emitters of $\text{NO}_\text{x}$ -- substance harmful both to human health and the environment. The rapid growth of the industry causes societal pressure on controlling the emission levels produced by ships. All the methods currently used for ship emission monitoring are costly and require proximity to a ship, which makes global and conti…
▽ More
The shipping industry is one of the strongest anthropogenic emitters of $\text{NO}_\text{x}$ -- substance harmful both to human health and the environment. The rapid growth of the industry causes societal pressure on controlling the emission levels produced by ships. All the methods currently used for ship emission monitoring are costly and require proximity to a ship, which makes global and continuous emission monitoring impossible. A promising approach is the application of remote sensing. Studies showed that some of the $\text{NO}_\text{2}$ plumes from individual ships can visually be distinguished using the TROPOspheric Monitoring Instrument on board the Copernicus Sentinel 5 Precursor (TROPOMI/S5P). To deploy a remote sensing-based global emission monitoring system, an automated procedure for the estimation of $\text{NO}_\text{2}$ emissions from individual ships is needed. The extremely low signal-to-noise ratio of the available data as well as the absence of ground truth makes the task very challenging. Here, we present a methodology for the automated segmentation of $\text{NO}_\text{2}$ plumes produced by seagoing ships using supervised machine learning on TROPOMI/S5P data. We show that the proposed approach leads to a more than a 20\% increase in the average precision score in comparison to the methods used in previous studies and results in a high correlation of 0.834 with the theoretically derived ship emission proxy. This work is a crucial step toward the development of an automated procedure for global ship emission monitoring using remote sensing data.
△ Less
Submitted 7 April, 2023; v1 submitted 14 March, 2022;
originally announced March 2022.
-
DenseUNets with feedback non-local attention for the segmentation of specular microscopy images of the corneal endothelium with guttae
Authors:
Juan P. Vigueras-Guillén,
Jeroen van Rooij,
Bart T. H. van Dooren,
Hans G. Lemij,
Esma Islamaj,
Lucas J. van Vliet,
Koenraad A. Vermeer
Abstract:
To estimate the corneal endothelial parameters from specular microscopy images depicting cornea guttata (Fuchs dystrophy), we propose a new deep learning methodology that includes a novel attention mechanism named feedback non-local attention (fNLA). Our approach first infers the cell edges, then selects the cells that are well detected, and finally applies a postprocessing method to correct mista…
▽ More
To estimate the corneal endothelial parameters from specular microscopy images depicting cornea guttata (Fuchs dystrophy), we propose a new deep learning methodology that includes a novel attention mechanism named feedback non-local attention (fNLA). Our approach first infers the cell edges, then selects the cells that are well detected, and finally applies a postprocessing method to correct mistakes and provide the binary segmentation from which the corneal parameters are estimated (cell density [ECD], coefficient of variation [CV], and hexagonality [HEX]). In this study, we analyzed 1203 images acquired with a Topcon SP-1P microscope, 500 of which contained guttae. Manual segmentation was performed in all images. We compared the results of different networks (UNet, ResUNeXt, DenseUNets, UNet++) and found that DenseUNets with fNLA provided the best performance, with a mean absolute error of 23.16 [cells/mm$^{2}$] in ECD, 1.28 [%] in CV, and 3.13 [%] in HEX, which was 3-6 times smaller than the error obtained by Topcon's built-in software. Our approach handled the cells affected by guttae remarkably well, detecting cell edges occluded by small guttae while discarding areas covered by large guttae. Overall, the proposed method obtained accurate estimations in extremely challenging specular images.
△ Less
Submitted 21 March, 2022; v1 submitted 3 March, 2022;
originally announced March 2022.
-
Stochastic Semantics and Statistical Model Checking for Networks of Priced Timed Automata
Authors:
Alexandre David,
Kim G. Larsen,
Axel Legay,
Marius Mikučionis,
Danny Bøgsted Poulsen,
Jonas van Vliet,
Zheng Wang
Abstract:
This paper offers a natural stochastic semantics of Networks of Priced Timed Automata (NPTA) based on races between components. The semantics provides the basis for satisfaction of probabilistic Weighted CTL properties (PWCTL), conservatively extending the classical satisfaction of timed automata with respect to TCTL. In particular the extension allows for hard real-time properties of timed automa…
▽ More
This paper offers a natural stochastic semantics of Networks of Priced Timed Automata (NPTA) based on races between components. The semantics provides the basis for satisfaction of probabilistic Weighted CTL properties (PWCTL), conservatively extending the classical satisfaction of timed automata with respect to TCTL. In particular the extension allows for hard real-time properties of timed automata expressible in TCTL to be refined by performance properties, e.g. in terms of probabilistic guarantees of time- and cost-bounded properties. A second contribution of the paper is the application of Statistical Model Checking (SMC) to efficiently estimate the correctness of non-nested PWCTL model checking problems with a desired level of confidence, based on a number of independent runs of the NPTA. In addition to applying classical SMC algorithms, we also offer an extension that allows to efficiently compare performance properties of NPTAs in a parametric setting. The third contribution is an efficient tool implementation of our result and applications to several case studies.
△ Less
Submitted 28 November, 2014; v1 submitted 20 June, 2011;
originally announced June 2011.