Search | arXiv e-print repository

Duplicate Detection as a Service

Authors: Juliette Opdenplatz, Umutcan Şimşek, Dieter Fensel

Abstract: Completeness of a knowledge graph is an important quality dimension and factor on how well an application that makes use of it performs. Completeness can be improved by performing knowledge enrichment. Duplicate detection aims to find identity links between the instances of knowledge graphs and is a fundamental subtask of knowledge enrichment. Current solutions to the problem require expert knowle… ▽ More Completeness of a knowledge graph is an important quality dimension and factor on how well an application that makes use of it performs. Completeness can be improved by performing knowledge enrichment. Duplicate detection aims to find identity links between the instances of knowledge graphs and is a fundamental subtask of knowledge enrichment. Current solutions to the problem require expert knowledge of the tool and the knowledge graph they are applied to. Users might not have this expert knowledge. We present our service-based approach to the duplicate detection task that provides an easy-to-use no-code solution that is still competitive with the state-of-the-art and has recently been adopted in an industrial context. The evaluation will be based on several frequently used test scenarios. △ Less

Submitted 20 July, 2022; originally announced July 2022.

arXiv:1906.06492 [pdf, other]

A formal approach for customization of schema.org based on SHACL

Authors: Umutcan Şimşek, Kevin Angele, Elias Kärle, Oleksandra Panasiuk, Dieter Fensel

Abstract: Schema.org is a widely adopted vocabulary for semantic annotation of content and data. However, its generic nature makes it complicated for data publishers to pick right types and properties for a specific domain and task. In this paper we propose a formal approach, a domain specification process that generates domain specific patterns by applying operators implemented in SHACL to the schema.org v… ▽ More Schema.org is a widely adopted vocabulary for semantic annotation of content and data. However, its generic nature makes it complicated for data publishers to pick right types and properties for a specific domain and task. In this paper we propose a formal approach, a domain specification process that generates domain specific patterns by applying operators implemented in SHACL to the schema.org vocabulary. These patterns can support knowledge generation and assessment processes for specific domains and tasks. We demonstrated our approach with use cases in tourism domain. △ Less

Submitted 15 June, 2019; originally announced June 2019.

Comments: Technical Report

arXiv:1904.01353 [pdf, other]

Verification and Validation of Semantic Annotations

Authors: Oleksandra Panasiuk, Omar Holzknecht, Umutcan Şimşek, Elias Kärle, Dieter Fensel

Abstract: In this paper, we propose a framework to perform verification and validation of semantically annotated data. The annotations, extracted from websites, are verified against the schema.org vocabulary and Domain Specifications to ensure the syntactic correctness and completeness of the annotations. The Domain Specifications allow checking the compliance of annotations against corresponding domain-spe… ▽ More In this paper, we propose a framework to perform verification and validation of semantically annotated data. The annotations, extracted from websites, are verified against the schema.org vocabulary and Domain Specifications to ensure the syntactic correctness and completeness of the annotations. The Domain Specifications allow checking the compliance of annotations against corresponding domain-specific constraints. The validation mechanism will detect errors and inconsistencies between the content of the analyzed schema.org annotations and the content of the web pages where the annotations were found. △ Less

Submitted 20 May, 2019; v1 submitted 2 April, 2019; originally announced April 2019.

Comments: Accepted for the A.P. Ershov Informatics Conference 2019(the PSI Conference Series, 12th edition) proceeding

arXiv:1903.04969 [pdf, other]

RocketRML - A NodeJS implementation of a use-case specific RML mapper

Authors: Umutcan Şimşek, Elias Kärle, Dieter Fensel

Abstract: The creation of Linked Data from raw data sources is, in theory, no rocket science (pun intended). Depending on the nature of the input and the mapping technology in use, it can become a quite tedious task. For our work on mapping real-life touristic data to the schema.org vocabulary we used RML but soon encountered, that the existing Java mapper implementations reached their limits and were not s… ▽ More The creation of Linked Data from raw data sources is, in theory, no rocket science (pun intended). Depending on the nature of the input and the mapping technology in use, it can become a quite tedious task. For our work on mapping real-life touristic data to the schema.org vocabulary we used RML but soon encountered, that the existing Java mapper implementations reached their limits and were not sufficient for our use cases. In this paper we describe a new implementation of an RML mapper. Written with the JavaScript based NodeJS framework it performs quite well for our uses cases where we work with large XML and JSON files. The performance testing and the execution of the RML test cases have shown, that the implementation has great potential to perform heavy mapping tasks in reasonable time, but comes with some limitations regarding JOINs, Named Graphs and inputs other than XML and JSON - which is fine at the moment, due to the nature of the given use cases. △ Less

Submitted 12 March, 2019; originally announced March 2019.

Comments: 8 pages, submitted to KGB Workshop 2019 at ESWC

arXiv:1807.01292 [pdf, other]

Intent Generation for Goal-Oriented Dialogue Systems based on Schema.org Annotations

Authors: Umutcan Şimşek, Dieter Fensel

Abstract: Goal-oriented dialogue systems typically communicate with a backend (e.g. database, Web API) to complete certain tasks to reach a goal. The intents that a dialogue system can recognize are mostly included to the system by the developer statically. For an open dialogue system that can work on more than a small set of well curated data and APIs, this manual intent creation will not scalable. In this… ▽ More Goal-oriented dialogue systems typically communicate with a backend (e.g. database, Web API) to complete certain tasks to reach a goal. The intents that a dialogue system can recognize are mostly included to the system by the developer statically. For an open dialogue system that can work on more than a small set of well curated data and APIs, this manual intent creation will not scalable. In this paper, we introduce a straightforward methodology for intent creation based on semantic annotation of data and services on the web. With this method, the Natural Language Understanding (NLU) module of a goal-oriented dialogue system can adapt to newly introduced APIs without requiring heavy developer involvement. We were able to extract intents and necessary slots to be filled from schema.org annotations. We were also able to create a set of initial training sentences for classifying user utterances into the generated intents. We demonstrate our approach on the NLU module of a state-of-the art dialogue system development framework. △ Less

Submitted 3 July, 2018; originally announced July 2018.

Comments: Presented in the First International Workshop on Chatbots co-located with ICWSM 2018 in Stanford, CA

arXiv:1805.05744 [pdf, other]

Building an Ecosystem for the Tyrolean Tourism Knowledge Graph

Authors: Elias Kärle, Umutcan Şimşek, Oleksandra Panasiuk, Dieter Fensel

Abstract: The introduction of the schema.org vocabulary was a big step towards making websites machine read- and understandable. Due to schema.org's RDF-like nature storing annotations in a graph database is easy and efficient. In this paper the authors show how they gather touristic data in the Austrian region of Tirol and provide this data publicly in a knowledge graph. The definition of subsets of the vo… ▽ More The introduction of the schema.org vocabulary was a big step towards making websites machine read- and understandable. Due to schema.org's RDF-like nature storing annotations in a graph database is easy and efficient. In this paper the authors show how they gather touristic data in the Austrian region of Tirol and provide this data publicly in a knowledge graph. The definition of subsets of the vocabulary is followed by providing means to map data sources efficiently to schema.org and then store the annotated content into the graph. To showcase the consumption of the touristic data four scenarios are described which use the knowledge graph for real life applications and data analysis. △ Less

Submitted 4 July, 2018; v1 submitted 15 May, 2018; originally announced May 2018.

arXiv:1805.05479 [pdf, other]

Machine Readable Web APIs with Schema.org Action Annotations

Authors: Umutcan Şimşek, Elias Kärle, Dieter Fensel

Abstract: The schema.org initiative led by the four major search engines curates a vocabulary for describing web content. The number of semantic annotations on the web are increasing, mostly due to the industrial incentives provided by those search engines. The annotations are not only consumed by search engines, but also by other automated agents like intelligent personal assistants (IPAs). However, only a… ▽ More The schema.org initiative led by the four major search engines curates a vocabulary for describing web content. The number of semantic annotations on the web are increasing, mostly due to the industrial incentives provided by those search engines. The annotations are not only consumed by search engines, but also by other automated agents like intelligent personal assistants (IPAs). However, only annotating data is not enough for automated agents to reach their full potential. Web APIs should be also annotated for automating service consumption, so the IPAs can complete tasks like booking a hotel room or buying a ticket for an event on the fly. Although there has been a vast amount of effort in the semantic web services field, the approaches did not gain too much adoption outside of academia, mainly due to lack of concrete incentives and steep learning curves. In this paper, we suggest a lightweight, bottom-up approach based on schema.org actions to annotate Web APIs. We analyse schema.org vocabulary in the scope of lightweight semantic web services literature and propose extensions where necessary. We show that schema.org actions could be a suitable vocabulary for Web API description. We demonstrate our work by annotating existing Web APIs of accommodation service providers. Additionally, we briefly demonstrate how these APIs can be used dynamically, for example, by a dialogue system. △ Less

Submitted 14 May, 2018; originally announced May 2018.

Comments: Submitted to SEMANTICS 2018 Conference

arXiv:1802.05948 [pdf, other]

Analysis of Schema.org Usage in the Tourism Domain

Authors: Boran Taylan Balcı, Umutcan Şimşek, Elias Kärle, Dieter Fensel

Abstract: Schema.org is an initiative founded in 2011 by the four-big search engine Bing, Google, Yahoo!, and Yandex. The goal of the initiative is to publish and maintain the schema.org vocabulary, in order to facilitate the publication of structured data on the web which can enable the implementation of automated agents like intelligent personal assistants and chatbots. In this paper, the usage of schema.… ▽ More Schema.org is an initiative founded in 2011 by the four-big search engine Bing, Google, Yahoo!, and Yandex. The goal of the initiative is to publish and maintain the schema.org vocabulary, in order to facilitate the publication of structured data on the web which can enable the implementation of automated agents like intelligent personal assistants and chatbots. In this paper, the usage of schema.org in tourism domain between years 2013 and 2016 is analysed. The analysis shows the adoption of schema.org, which indicates how well the tourism sector is prepared for the web that targets automated agents. The results have shown that the adoption of schema.org type and properties is grown over the years. While the US is dominating the annotation numbers, a drastic drop is observed for the proportion of the US in 2016. Poorly rated businesses are encountered more in 2016 results in comparison to previous years. △ Less

Submitted 16 February, 2018; originally announced February 2018.

Comments: Presented in ENTER 2018 conference in Jönkoping

Journal ref: e-Review of Tourism Research, ENTER 2018: Volume 9 Research Notes

arXiv:1711.03425 [pdf]

Defining Tourism Domains for Semantic Annotation of Web Content

Authors: Oleksandra Panasiuk, Elias Kärle, Umutcan Simsek, Dieter Fensel

Abstract: Schema.org is an initiative by Bing, Google, Yahoo! and Yandex that publishes a vocabulary for creating structured data markup on web pages. The use of schema.org is necessary to increase the visibility of a website, making the content understandable to different automated agents (e.g. search engines, chatbots or personal assistant systems). The domain specifications are the subsets of types from… ▽ More Schema.org is an initiative by Bing, Google, Yahoo! and Yandex that publishes a vocabulary for creating structured data markup on web pages. The use of schema.org is necessary to increase the visibility of a website, making the content understandable to different automated agents (e.g. search engines, chatbots or personal assistant systems). The domain specifications are the subsets of types from the schema.org vocabulary, each associated with a set of properties. The challenge is to choose the right classes and properties for an annotation in a given domain. In this paper we address the problem of finding a subset of types and properties for complete and correct annotation of different tourism domains. The approach provides a collection of domain specifications that were built based on domain analysis and vocabulary selection. △ Less

Submitted 16 February, 2018; v1 submitted 9 November, 2017; originally announced November 2017.

Comments: ENTER 2018 Conference on Information and Communication Technologies in Tourism, Published as Research Notes in e-Review of Tourism Research, vol.9

Journal ref: e-Review of Tourism Research (eRTR), volume 9. https://ertr.tamu.edu/files/2018/01/ENTER2018_Submission_94-ok.pdf

arXiv:1707.06433 [pdf]

Data Aggregation, Fusion and Recommendations for Strengthening Citizens Energy-aware Behavioural Profiles

Authors: Eleni Fotopoulou, Anastasios Zafeiropoulos, Fernando Terroso, Aurora Gonzalez, Antonio Skarmeta, Umutcan Şimşek, Anna Fensel

Abstract: In this paper, ENTROPY platform, an IT ecosystem for supporting energy efficiency in buildings through behavioural change of the occupants is provided. The ENTROPY platform targets at providing a set of mechanisms for accelerating the adoption of energy efficient practices through the increase of the energy awareness and energy saving potential of the occupants. The platform takes advantage of nov… ▽ More In this paper, ENTROPY platform, an IT ecosystem for supporting energy efficiency in buildings through behavioural change of the occupants is provided. The ENTROPY platform targets at providing a set of mechanisms for accelerating the adoption of energy efficient practices through the increase of the energy awareness and energy saving potential of the occupants. The platform takes advantage of novel sensor networking technologies for supporting efficient sensor data aggregation mechanisms, semantic web technologies for unified data representation, machine learning mechanisms for getting insights from the available data and recommendation mechanisms for providing personalised content to end users. These technologies are combined and provided through an integrated platform, targeting at leading to occupants' behavioural change with regards to their energy consumption profiles. △ Less

Submitted 20 July, 2017; originally announced July 2017.

Comments: To appear in the proceedings of Global IoT Summit 2017

arXiv:1706.10067 [pdf, other]

semantify.it, a Platform for Creation, Publication and Distribution of Semantic Annotations

Authors: Elias Kärle, Umutcan Şimşek, Dieter Fensel

Abstract: The application of semantic technologies to content on the web is, in many regards, important and urgent. Search engines, chatbots, intelligent personal assistants and other technologies increasingly rely on content published as semantic structured data. Yet, the process of creating this kind of data is still complicated and widely unknown. The semantify.it platform implements an approach to solve… ▽ More The application of semantic technologies to content on the web is, in many regards, important and urgent. Search engines, chatbots, intelligent personal assistants and other technologies increasingly rely on content published as semantic structured data. Yet, the process of creating this kind of data is still complicated and widely unknown. The semantify.it platform implements an approach to solve three of the most challenging question regarding the publication of structured semantic data, namely: a) what vocabulary to use, b) how to create annotation files and c) how to publish or integrate annotations within a website without programming. This paper presents the idea and the development of the semantify.it platform. It demonstrates that the creation process of semantically annotated data does not have to be hard, shows use cases and pilot users of the created software and presents where the development of this platform or alike projects lead to in the future. △ Less

Submitted 1 October, 2017; v1 submitted 30 June, 2017; originally announced June 2017.

arXiv:1706.06384 [pdf, other]

Domain Specific Semantic Validation of Schema.org Annotations

Authors: Umutcan Şimşek, Elias Kärle, Omar Holzknecht, Dieter Fensel

Abstract: Since its unveiling in 2011, schema.org has become the de facto standard for publishing semantically described structured data on the web, typically in the form of web page annotations. The increasing adoption of schema.org facilitates the growth of the web of data, as well as the development of automated agents that operate on this data. Schema.org is a large heterogeneous vocabulary that covers… ▽ More Since its unveiling in 2011, schema.org has become the de facto standard for publishing semantically described structured data on the web, typically in the form of web page annotations. The increasing adoption of schema.org facilitates the growth of the web of data, as well as the development of automated agents that operate on this data. Schema.org is a large heterogeneous vocabulary that covers many domains. This is obviously not a bug, but a feature, since schema.org aims to describe almost everything on the web, and the web is huge. However, the heterogeneity of schema.org may cause a side effect, which is the challenge of picking the right classes and properties for an annotation in a certain domain, as well as keeping the annotation semantically consistent. In this work, we introduce our rule based approach and an implementation of it for validating schema.org annotations from two aspects: (a) the completeness of the annotations in terms of a specified domain, (b) the semantic consistency of the values based on pre-defined rules. We demonstrate our approach in the tourism domain. △ Less

Submitted 15 September, 2017; v1 submitted 20 June, 2017; originally announced June 2017.

Comments: Accepted to PSI 2017 Conference in Moscow, Russia 13 pages, 4 figures, 3 listings

arXiv:1706.05995 [pdf, other]

Complete Semantics to empower Touristic Service Providers

Authors: Zaenal Akbar, Elias Kärle, Oleksandra Panasiuk, Umutcan Şimşek, Ioan Toma, Dieter Fensel

Abstract: The tourism industry has a significant impact on the world's economy, contributes 10.2% of the world's gross domestic product in 2016. It becomes a very competitive industry, where having a strong online presence is an essential aspect for business success. To achieve this goal, the proper usage of latest Web technologies, particularly schema.org annotations is crucial. In this paper, we present o… ▽ More The tourism industry has a significant impact on the world's economy, contributes 10.2% of the world's gross domestic product in 2016. It becomes a very competitive industry, where having a strong online presence is an essential aspect for business success. To achieve this goal, the proper usage of latest Web technologies, particularly schema.org annotations is crucial. In this paper, we present our effort to improve the online visibility of touristic service providers in the region of Tyrol, Austria, by creating and deploying a substantial amount of semantic annotations according to schema.org, a widely used vocabulary for structured data on the Web. We started our work from Tourismusverband (TVB) Mayrhofen-Hippach and all touristic service providers in the Mayrhofen-Hippach region and applied the same approach to other TVBs and regions, as well as other use cases. The rationale for doing this is straightforward. Having schema.org annotations enables search engines to understand the content better, and provide better results for end users, as well as enables various intelligent applications to utilize them. As a direct consequence, the region of Tyrol and its touristic service increase their online visibility and decrease the dependency on intermediaries, i.e. Online Travel Agency (OTA). △ Less

Submitted 15 September, 2017; v1 submitted 19 June, 2017; originally announced June 2017.

Comments: 18 pages, 6 figures

Showing 1–13 of 13 results for author: Şimşek, U