-
Quantifying Semantic Query Similarity for Automated Linear SQL Grading: A Graph-based Approach
Authors:
Leo Köberlein,
Dominik Probst,
Richard Lenz
Abstract:
Quantifying the semantic similarity between database queries is a critical challenge with broad applications, ranging from query log analysis to automated educational assessment of SQL skills. Traditional methods often rely solely on syntactic comparisons or are limited to checking for semantic equivalence.
This paper introduces a novel graph-based approach to measure the semantic dissimilarity…
▽ More
Quantifying the semantic similarity between database queries is a critical challenge with broad applications, ranging from query log analysis to automated educational assessment of SQL skills. Traditional methods often rely solely on syntactic comparisons or are limited to checking for semantic equivalence.
This paper introduces a novel graph-based approach to measure the semantic dissimilarity between SQL queries. Queries are represented as nodes in an implicit graph, while the transitions between nodes are called edits, which are weighted by semantic dissimilarity. We employ shortest path algorithms to identify the lowest-cost edit sequence between two given queries, thereby defining a quantifiable measure of semantic distance.
A prototype implementation of this technique has been evaluated through an empirical study, which strongly suggests that our method provides more accurate and comprehensible grading compared to existing techniques. Moreover, the results indicate that our approach comes close to the quality of manual grading, making it a robust tool for diverse database query comparison tasks.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
DEUS: Distributed Electronic Patient File Update System
Authors:
Christoph P. Neumann,
Florian Rampp,
Richard Lenz
Abstract:
Inadequate availability of patient information is a major cause for medical errors and affects costs in healthcare. Traditional approaches to information integration in healthcare do not solve the problem. Applying a document-oriented paradigm to systems integration enables inter-institutional information exchange in healthcare. The goal of the proposed architecture is to provide information excha…
▽ More
Inadequate availability of patient information is a major cause for medical errors and affects costs in healthcare. Traditional approaches to information integration in healthcare do not solve the problem. Applying a document-oriented paradigm to systems integration enables inter-institutional information exchange in healthcare. The goal of the proposed architecture is to provide information exchange between strict autonomous healthcare institutions, bridging the gap between primary and secondary care. In a long-term healthcare data distribution scenario, the patient has to maintain sovereignty over any personal health information. Thus, the traditional publish-subscribe architecture is extended by a phase of human mediation within the data flow. DEUS essentially decouples the roles of information author and information publisher into distinct actors, resulting in a triangular data flow. The interaction scenario will be motivated. The significance of human mediation will be discussed. DEUS provides a carefully distinguished actor and role model for mediated pub-sub. The data flow between the participants is factored into distinct phases of information interchange. The artefact model is decomposed into role-dependent constituent parts. Both a domain specific (healthcare) terminology and a generic terminology is provided. From a technical perspective, the system design is presented. The sublayer for network transfer will be highlighted as well as the subsystem for human-machine interaction.
△ Less
Submitted 24 November, 2023;
originally announced November 2023.
-
Homological Time Series Analysis of Sensor Signals from Power Plants
Authors:
Luciano Melodia,
Richard Lenz
Abstract:
In this paper, we use topological data analysis techniques to construct a suitable neural network classifier for the task of learning sensor signals of entire power plants according to their reference designation system. We use representations of persistence diagrams to derive necessary preprocessing steps and visualize the large amounts of data. We derive deep architectures with one-dimensional c…
▽ More
In this paper, we use topological data analysis techniques to construct a suitable neural network classifier for the task of learning sensor signals of entire power plants according to their reference designation system. We use representations of persistence diagrams to derive necessary preprocessing steps and visualize the large amounts of data. We derive deep architectures with one-dimensional convolutional layers combined with stacked long short-term memories as residual networks suitable for processing the persistence features. We combine three separate sub-networks, obtaining as input the time series itself and a representation of the persistent homology for the zeroth and first dimension. We give a mathematical derivation for most of the used hyper-parameters. For validation, numerical experiments were performed with sensor data from four power plants of the same construction type.
△ Less
Submitted 8 March, 2022; v1 submitted 3 June, 2021;
originally announced June 2021.
-
Estimate of the Neural Network Dimension using Algebraic Topology and Lie Theory
Authors:
Luciano Melodia,
Richard Lenz
Abstract:
In this paper we present an approach to determine the smallest possible number of neurons in a layer of a neural network in such a way that the topology of the input space can be learned sufficiently well. We introduce a general procedure based on persistent homology to investigate topological invariants of the manifold on which we suspect the data set. We specify the required dimensions precisely…
▽ More
In this paper we present an approach to determine the smallest possible number of neurons in a layer of a neural network in such a way that the topology of the input space can be learned sufficiently well. We introduce a general procedure based on persistent homology to investigate topological invariants of the manifold on which we suspect the data set. We specify the required dimensions precisely, assuming that there is a smooth manifold on or near which the data are located. Furthermore, we require that this space is connected and has a commutative group structure in the mathematical sense. These assumptions allow us to derive a decomposition of the underlying space whose topology is well known. We use the representatives of the $k$-dimensional homology groups from the persistence landscape to determine an integer dimension for this decomposition. This number is the dimension of the embedding that is capable of capturing the topology of the data manifold. We derive the theory and validate it experimentally on toy data sets.
△ Less
Submitted 8 March, 2022; v1 submitted 6 April, 2020;
originally announced April 2020.
-
Persistent Homology as Stopping-Criterion for Voronoi Interpolation
Authors:
Luciano Melodia,
Richard Lenz
Abstract:
In this study the Voronoi interpolation is used to interpolate a set of points drawn from a topological space with higher homology groups on its filtration. The technique is based on Voronoi tessellation, which induces a natural dual map to the Delaunay triangulation. Advantage is taken from this fact calculating the persistent homology on it after each iteration to capture the changing topology o…
▽ More
In this study the Voronoi interpolation is used to interpolate a set of points drawn from a topological space with higher homology groups on its filtration. The technique is based on Voronoi tessellation, which induces a natural dual map to the Delaunay triangulation. Advantage is taken from this fact calculating the persistent homology on it after each iteration to capture the changing topology of the data. The boundary points are identified as critical. The Bottleneck and Wasserstein distance serve as a measure of quality between the original point set and the interpolation. If the norm of two distances exceeds a heuristically determined threshold, the algorithm terminates. We give the theoretical basis for this approach and justify its validity with numerical experiments.
△ Less
Submitted 9 March, 2022; v1 submitted 8 November, 2019;
originally announced November 2019.
-
The Mehler-Fock Transform and some Applications in Texture Analysis and Color Processing
Authors:
Reiner Lenz
Abstract:
Many stochastic processes are defined on special geometrical objects like spheres and cones. We describe how tools from harmonic analysis, i.e. Fourier analysis on groups, can be used to investigate probability density functions (pdfs) on groups and homogeneous spaces. We consider the special case of the Lorentz group SU(1,1) and the unit disk with its hyperbolic geometry, but the procedure can be…
▽ More
Many stochastic processes are defined on special geometrical objects like spheres and cones. We describe how tools from harmonic analysis, i.e. Fourier analysis on groups, can be used to investigate probability density functions (pdfs) on groups and homogeneous spaces. We consider the special case of the Lorentz group SU(1,1) and the unit disk with its hyperbolic geometry, but the procedure can be generalized to a much wider class of Lie-groups. We mainly concentrate on the Mehler-Fock transform which is the radial part of the Fourier transform on the disk. Some of the characteristic features of this transform are the relation to group-convolutions, the isometry between signal and transform space, the relation to the Laplace-Beltrami operator and the relation to group representation theory. We will give an overview over these properties and their applications in signal processing. We will illustrate the theory with two examples from low-level vision and color image processing.
△ Less
Submitted 14 December, 2016;
originally announced December 2016.
-
Anfrage-getriebener Wissenstransfer zur Unterstuetzung von Datenanalysten
Authors:
Andreas M. Wahl,
Gregor Endler,
Peter K. Schwab,
Sebastian Herbst,
Richard Lenz
Abstract:
In larger organizations, multiple teams of data scientists have to integrate data from heterogeneous data sources as preparation for data analysis tasks. Writing effective analytical queries requires data scientists to have in-depth knowledge of the existence, semantics, and usage context of data sources. Once gathered, such knowledge is informally shared within a specific team of data scientists,…
▽ More
In larger organizations, multiple teams of data scientists have to integrate data from heterogeneous data sources as preparation for data analysis tasks. Writing effective analytical queries requires data scientists to have in-depth knowledge of the existence, semantics, and usage context of data sources. Once gathered, such knowledge is informally shared within a specific team of data scientists, but usually is neither formalized nor shared with other teams. Potential synergies remain unused. We therefore introduce a novel approach which extends data management systems with additional knowledge-sharing capabilities to facilitate user collaboration without altering established data analysis workflows. Relevant collective knowledge from the query log is extracted to support data source discovery and incremental data integration. Extracted knowledge is formalized and provided at query time.
△ Less
Submitted 20 October, 2016;
originally announced October 2016.
-
Modelling, Measuring and Compensating Color Weak Vision
Authors:
Satoshi Oshima,
Rica Mochizuki,
Reiner Lenz,
Jinhui Chao
Abstract:
We use methods from Riemann geometry to investigate transformations between the color spaces of color-normal and color weak observers. The two main applications are the simulation of the perception of a color weak observer for a color normal observer and the compensation of color images in a way that a color weak observer has approximately the same perception as a color normal observer. The metric…
▽ More
We use methods from Riemann geometry to investigate transformations between the color spaces of color-normal and color weak observers. The two main applications are the simulation of the perception of a color weak observer for a color normal observer and the compensation of color images in a way that a color weak observer has approximately the same perception as a color normal observer. The metrics in the color spaces of interest are characterized with the help of ellipsoids defined by the just-noticable-differences between color which are measured with the help of color-matching experiments. The constructed mappings are isometries of Riemann spaces that preserve the perceived color-differences for both observers. Among the two approaches to build such an isometry, we introduce normal coordinates in Riemann spaces as a tool to construct a global color-weak compensation map. Compared to previously used methods this method is free from approximation errors due to local linearizations and it avoids the problem of shifting locations of the origin of the local coordinate system. We analyse the variations of the Riemann metrics for different observers obtained from new color matching experiments and describe three variations of the basic method. The performance of the methods is evaluated with the help of semantic differential (SD) tests.
△ Less
Submitted 22 October, 2015;
originally announced October 2015.
-
Saccadic Eye Movements and the Generalized Pareto Distribution
Authors:
Reiner Lenz
Abstract:
We describe a statistical analysis of the eye tracker measurements in a database with 15 observers viewing 1003 images under free-viewing conditions. In contrast to the common approach of investigating the properties of the fixation points we analyze the properties of the transition phases between fixations. We introduce hyperbolic geometry as a tool to measure the step length between consecutive…
▽ More
We describe a statistical analysis of the eye tracker measurements in a database with 15 observers viewing 1003 images under free-viewing conditions. In contrast to the common approach of investigating the properties of the fixation points we analyze the properties of the transition phases between fixations. We introduce hyperbolic geometry as a tool to measure the step length between consecutive eye positions. We show that the step lengths, measured in hyperbolic and euclidean geometry, follow a generalized Pareto distribution. The results based on the hyperbolic distance are more robust than those based on euclidean geometry. We show how the structure of the space of generalized Pareto distributions can be used to characterize and identify individual observers.
△ Less
Submitted 24 June, 2014;
originally announced June 2014.