-
Learning Local Causal World Models with State Space Models and Attention
Authors:
Francesco Petri,
Luigi Asprino,
Aldo Gangemi
Abstract:
World modelling, i.e. building a representation of the rules that govern the world so as to predict its evolution, is an essential ability for any agent interacting with the physical world. Despite their impressive performance, many solutions fail to learn a causal representation of the environment they are trying to model, which would be necessary to gain a deep enough understanding of the world…
▽ More
World modelling, i.e. building a representation of the rules that govern the world so as to predict its evolution, is an essential ability for any agent interacting with the physical world. Despite their impressive performance, many solutions fail to learn a causal representation of the environment they are trying to model, which would be necessary to gain a deep enough understanding of the world to perform complex tasks. With this work, we aim to broaden the research in the intersection of causality theory and neural world modelling by assessing the potential for causal discovery of the State Space Model (SSM) architecture, which has been shown to have several advantages over the widespread Transformer. We show empirically that, compared to an equivalent Transformer, a SSM can model the dynamics of a simple environment and learn a causal model at the same time with equivalent or better performance, thus paving the way for further experiments that lean into the strength of SSMs and further enhance them with causal awareness.
△ Less
Submitted 4 May, 2025;
originally announced May 2025.
-
Assessing the Capability of Large Language Models for Domain-Specific Ontology Generation
Authors:
Anna Sofia Lippolis,
Mohammad Javad Saeedizade,
Robin Keskisarkka,
Aldo Gangemi,
Eva Blomqvist,
Andrea Giovanni Nuzzolese
Abstract:
Large Language Models (LLMs) have shown significant potential for ontology engineering. However, it is still unclear to what extent they are applicable to the task of domain-specific ontology generation. In this study, we explore the application of LLMs for automated ontology generation and evaluate their performance across different domains. Specifically, we investigate the generalizability of tw…
▽ More
Large Language Models (LLMs) have shown significant potential for ontology engineering. However, it is still unclear to what extent they are applicable to the task of domain-specific ontology generation. In this study, we explore the application of LLMs for automated ontology generation and evaluate their performance across different domains. Specifically, we investigate the generalizability of two state-of-the-art LLMs, DeepSeek and o1-preview, both equipped with reasoning capabilities, by generating ontologies from a set of competency questions (CQs) and related user stories. Our experimental setup comprises six distinct domains carried out in existing ontology engineering projects and a total of 95 curated CQs designed to test the models' reasoning for ontology engineering. Our findings show that with both LLMs, the performance of the experiments is remarkably consistent across all domains, indicating that these methods are capable of generalizing ontology generation tasks irrespective of the domain. These results highlight the potential of LLM-based approaches in achieving scalable and domain-agnostic ontology construction and lay the groundwork for further research into enhancing automated reasoning and knowledge representation techniques.
△ Less
Submitted 24 April, 2025;
originally announced April 2025.
-
Enhancing multimodal analogical reasoning with Logic Augmented Generation
Authors:
Anna Sofia Lippolis,
Andrea Giovanni Nuzzolese,
Aldo Gangemi
Abstract:
Recent advances in Large Language Models have demonstrated their capabilities across a variety of tasks. However, automatically extracting implicit knowledge from natural language remains a significant challenge, as machines lack active experience with the physical world. Given this scenario, semantic knowledge graphs can serve as conceptual spaces that guide the automated text generation reasonin…
▽ More
Recent advances in Large Language Models have demonstrated their capabilities across a variety of tasks. However, automatically extracting implicit knowledge from natural language remains a significant challenge, as machines lack active experience with the physical world. Given this scenario, semantic knowledge graphs can serve as conceptual spaces that guide the automated text generation reasoning process to achieve more efficient and explainable results. In this paper, we apply a logic-augmented generation (LAG) framework that leverages the explicit representation of a text through a semantic knowledge graph and applies it in combination with prompt heuristics to elicit implicit analogical connections. This method generates extended knowledge graph triples representing implicit meaning, enabling systems to reason on unlabeled multimodal data regardless of the domain. We validate our work through three metaphor detection and understanding tasks across four datasets, as they require deep analogical reasoning capabilities. The results show that this integrated approach surpasses current baselines, performs better than humans in understanding visual metaphors, and enables more explainable reasoning processes, though still has inherent limitations in metaphor understanding, especially for domain-specific metaphors. Furthermore, we propose a thorough error analysis, discussing issues with metaphorical annotations and current evaluation methods.
△ Less
Submitted 13 June, 2025; v1 submitted 15 April, 2025;
originally announced April 2025.
-
Ontology Generation using Large Language Models
Authors:
Anna Sofia Lippolis,
Mohammad Javad Saeedizade,
Robin Keskisärkkä,
Sara Zuppiroli,
Miguel Ceriani,
Aldo Gangemi,
Eva Blomqvist,
Andrea Giovanni Nuzzolese
Abstract:
The ontology engineering process is complex, time-consuming, and error-prone, even for experienced ontology engineers. In this work, we investigate the potential of Large Language Models (LLMs) to provide effective OWL ontology drafts directly from ontological requirements described using user stories and competency questions. Our main contribution is the presentation and evaluation of two new pro…
▽ More
The ontology engineering process is complex, time-consuming, and error-prone, even for experienced ontology engineers. In this work, we investigate the potential of Large Language Models (LLMs) to provide effective OWL ontology drafts directly from ontological requirements described using user stories and competency questions. Our main contribution is the presentation and evaluation of two new prompting techniques for automated ontology development: Memoryless CQbyCQ and Ontogenia. We also emphasize the importance of three structural criteria for ontology assessment, alongside expert qualitative evaluation, highlighting the need for a multi-dimensional evaluation in order to capture the quality and usability of the generated ontologies. Our experiments, conducted on a benchmark dataset of ten ontologies with 100 distinct CQs and 29 different user stories, compare the performance of three LLMs using the two prompting techniques. The results demonstrate improvements over the current state-of-the-art in LLM-supported ontology engineering. More specifically, the model OpenAI o1-preview with Ontogenia produces ontologies of sufficient quality to meet the requirements of ontology engineers, significantly outperforming novice ontology engineers in modelling ability. However, we still note some common mistakes and variability of result quality, which is important to take into account when using LLMs for ontology authoring support. We discuss these limitations and propose directions for future research.
△ Less
Submitted 7 March, 2025;
originally announced March 2025.
-
Logic Augmented Generation
Authors:
Aldo Gangemi,
Andrea Giovanni Nuzzolese
Abstract:
Semantic Knowledge Graphs (SKG) face challenges with scalability, flexibility, contextual understanding, and handling unstructured or ambiguous information. However, they offer formal and structured knowledge enabling highly interpretable and reliable results by means of reasoning and querying. Large Language Models (LLMs) overcome those limitations making them suitable in open-ended tasks and uns…
▽ More
Semantic Knowledge Graphs (SKG) face challenges with scalability, flexibility, contextual understanding, and handling unstructured or ambiguous information. However, they offer formal and structured knowledge enabling highly interpretable and reliable results by means of reasoning and querying. Large Language Models (LLMs) overcome those limitations making them suitable in open-ended tasks and unstructured environments. Nevertheless, LLMs are neither interpretable nor reliable. To solve the dichotomy between LLMs and SKGs we envision Logic Augmented Generation (LAG) that combines the benefits of the two worlds. LAG uses LLMs as Reactive Continuous Knowledge Graphs that can generate potentially infinite relations and tacit knowledge on-demand. SKGs are key for injecting a discrete heuristic dimension with clear logical and factual boundaries. We exemplify LAG in two tasks of collective intelligence, i.e., medical diagnostics and climate projections. Understanding the properties and limitations of LAG, which are still mostly unknown, is of utmost importance for enabling a variety of tasks involving tacit knowledge in order to provide interpretable and effective results.
△ Less
Submitted 14 January, 2025; v1 submitted 21 November, 2024;
originally announced November 2024.
-
Neurosymbolic Graph Enrichment for Grounded World Models
Authors:
Stefano De Giorgis,
Aldo Gangemi,
Alessandro Russo
Abstract:
The development of artificial intelligence systems capable of understanding and reasoning about complex real-world scenarios is a significant challenge. In this work we present a novel approach to enhance and exploit LLM reactive capability to address complex problems and interpret deeply contextual real-world meaning. We introduce a method and a tool for creating a multimodal, knowledge-augmented…
▽ More
The development of artificial intelligence systems capable of understanding and reasoning about complex real-world scenarios is a significant challenge. In this work we present a novel approach to enhance and exploit LLM reactive capability to address complex problems and interpret deeply contextual real-world meaning. We introduce a method and a tool for creating a multimodal, knowledge-augmented formal representation of meaning that combines the strengths of large language models with structured semantic representations. Our method begins with an image input, utilizing state-of-the-art large language models to generate a natural language description. This description is then transformed into an Abstract Meaning Representation (AMR) graph, which is formalized and enriched with logical design patterns, and layered semantics derived from linguistic and factual knowledge bases. The resulting graph is then fed back into the LLM to be extended with implicit knowledge activated by complex heuristic learning, including semantic implicatures, moral values, embodied cognition, and metaphorical representations. By bridging the gap between unstructured language models and formal semantic structures, our method opens new avenues for tackling intricate problems in natural language understanding and reasoning.
△ Less
Submitted 19 November, 2024;
originally announced November 2024.
-
Explainable Moral Values: a neuro-symbolic approach to value classification
Authors:
Nicolas Lazzari,
Stefano De Giorgis,
Aldo Gangemi,
Valentina Presutti
Abstract:
This work explores the integration of ontology-based reasoning and Machine Learning techniques for explainable value classification. By relying on an ontological formalization of moral values as in the Moral Foundations Theory, relying on the DnS Ontology Design Pattern, the \textit{sandra} neuro-symbolic reasoner is used to infer values (fomalized as descriptions) that are \emph{satisfied by} a c…
▽ More
This work explores the integration of ontology-based reasoning and Machine Learning techniques for explainable value classification. By relying on an ontological formalization of moral values as in the Moral Foundations Theory, relying on the DnS Ontology Design Pattern, the \textit{sandra} neuro-symbolic reasoner is used to infer values (fomalized as descriptions) that are \emph{satisfied by} a certain sentence. Sentences, alongside their structured representation, are automatically generated using an open-source Large Language Model. The inferred descriptions are used to automatically detect the value associated with a sentence. We show that only relying on the reasoner's inference results in explainable classification comparable to other more complex approaches. We show that combining the reasoner's inferences with distributional semantics methods largely outperforms all the baselines, including complex models based on neural network architectures. Finally, we build a visualization tool to explore the potential of theory-based values classification, which is publicly available at http://xmv.geomeaning.com/.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
Do Language Models Understand Morality? Towards a Robust Detection of Moral Content
Authors:
Luana Bulla,
Aldo Gangemi,
Misael Mongiovì
Abstract:
The task of detecting moral values in text has significant implications in various fields, including natural language processing, social sciences, and ethical decision-making. Previously proposed supervised models often suffer from overfitting, leading to hyper-specialized moral classifiers that struggle to perform well on data from different domains. To address this issue, we introduce novel syst…
▽ More
The task of detecting moral values in text has significant implications in various fields, including natural language processing, social sciences, and ethical decision-making. Previously proposed supervised models often suffer from overfitting, leading to hyper-specialized moral classifiers that struggle to perform well on data from different domains. To address this issue, we introduce novel systems that leverage abstract concepts and common-sense knowledge acquired from Large Language Models and Natural Language Inference models during previous stages of training on multiple data sources. By doing so, we aim to develop versatile and robust methods for detecting moral values in real-world scenarios. Our approach uses the GPT 3.5 model as a zero-shot ready-made unsupervised multi-label classifier for moral values detection, eliminating the need for explicit training on labeled data. We compare it with a smaller NLI-based zero-shot model. The results show that the NLI approach achieves competitive results compared to the Davinci model. Furthermore, we conduct an in-depth investigation of the performance of supervised systems in the context of cross-domain multi-label moral value detection. This involves training supervised models on different domains to explore their effectiveness in handling data from different sources and comparing their performance with the unsupervised methods. Our contributions encompass a thorough analysis of both supervised and unsupervised methodologies for cross-domain value detection. We introduce the Davinci model as a state-of-the-art zero-shot unsupervised moral values classifier, pushing the boundaries of moral value detection without the need for explicit training on labeled data. Additionally, we perform a comparative evaluation of our approach with the supervised models, shedding light on their respective strengths and weaknesses.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Transformers and Slot Encoding for Sample Efficient Physical World Modelling
Authors:
Francesco Petri,
Luigi Asprino,
Aldo Gangemi
Abstract:
World modelling, i.e. building a representation of the rules that govern the world so as to predict its evolution, is an essential ability for any agent interacting with the physical world. Recent applications of the Transformer architecture to the problem of world modelling from video input show notable improvements in sample efficiency. However, existing approaches tend to work only at the image…
▽ More
World modelling, i.e. building a representation of the rules that govern the world so as to predict its evolution, is an essential ability for any agent interacting with the physical world. Recent applications of the Transformer architecture to the problem of world modelling from video input show notable improvements in sample efficiency. However, existing approaches tend to work only at the image level thus disregarding that the environment is composed of objects interacting with each other. In this paper, we propose an architecture combining Transformers for world modelling with the slot-attention paradigm, an approach for learning representations of objects appearing in a scene. We describe the resulting neural architecture and report experimental results showing an improvement over the existing solutions in terms of sample efficiency and a reduction of the variation of the performance over the training examples. The code for our architecture and experiments is available at https://github.com/torchipeppo/transformers-and-slot-encoding-for-wm
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Sandra -- A Neuro-Symbolic Reasoner Based On Descriptions And Situations
Authors:
Nicolas Lazzari,
Stefano De Giorgis,
Aldo Gangemi,
Valentina Presutti
Abstract:
This paper presents sandra, a neuro-symbolic reasoner combining vectorial representations with deductive reasoning. Sandra builds a vector space constrained by an ontology and performs reasoning over it. The geometric nature of the reasoner allows its combination with neural networks, bridging the gap with symbolic knowledge representations. Sandra is based on the Description and Situation (DnS) o…
▽ More
This paper presents sandra, a neuro-symbolic reasoner combining vectorial representations with deductive reasoning. Sandra builds a vector space constrained by an ontology and performs reasoning over it. The geometric nature of the reasoner allows its combination with neural networks, bridging the gap with symbolic knowledge representations. Sandra is based on the Description and Situation (DnS) ontology design pattern, a formalization of frame semantics. Given a set of facts (a situation) it allows to infer all possible perspectives (descriptions) that can provide a plausible interpretation for it, even in presence of incomplete information. We prove that our method is correct with respect to the DnS model. We experiment with two different tasks and their standard benchmarks, demonstrating that, without increasing complexity, sandra (i) outperforms all the baselines (ii) provides interpretability in the classification process, and (iii) allows control over the vector space, which is designed a priori.
△ Less
Submitted 25 March, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
EFO: the Emotion Frame Ontology
Authors:
Stefano De Giorgis,
Aldo Gangemi
Abstract:
Emotions are a subject of intense debate in various disciplines. Despite the proliferation of theories and definitions, there is still no consensus on what emotions are, and how to model the different concepts involved when we talk about - or categorize - them. In this paper, we propose an OWL frame-based ontology of emotions: the Emotion Frames Ontology (EFO). EFO treats emotions as semantic fram…
▽ More
Emotions are a subject of intense debate in various disciplines. Despite the proliferation of theories and definitions, there is still no consensus on what emotions are, and how to model the different concepts involved when we talk about - or categorize - them. In this paper, we propose an OWL frame-based ontology of emotions: the Emotion Frames Ontology (EFO). EFO treats emotions as semantic frames, with a set of semantic roles that capture the different aspects of emotional experience. EFO follows pattern-based ontology design, and is aligned to the DOLCE foundational ontology. EFO is used to model multiple emotion theories, which can be cross-linked as modules in an Emotion Ontology Network. In this paper, we exemplify it by modeling Ekman's Basic Emotions (BE) Theory as an EFO-BE module, and demonstrate how to perform automated inferences on the representation of emotion situations. EFO-BE has been evaluated by lexicalizing the BE emotion frames from within the Framester knowledge graph, and implementing a graph-based emotion detector from text. In addition, an EFO integration of multimodal datasets, including emotional speech and emotional face expressions, has been performed to enable further inquiry into crossmodal emotion semantics.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
Streamlining Knowledge Graph Construction with a façade: The SPARQL Anything project
Authors:
Luigi Asprino,
Enrico Daga,
Justin Dowdy,
Paul Mulholland,
Aldo Gangemi,
Marco Ratta
Abstract:
What should a data integration framework for knowledge engineers look like? Recent research on Knowledge Graph construction proposes the design of a façade, a notion borrowed from object-oriented software engineering. This idea is applied to SPARQL Anything, a system that allows querying heterogeneous resources as-if they were in RDF, in plain SPARQL 1.1, by overloading the SERVICE clause. SPARQL…
▽ More
What should a data integration framework for knowledge engineers look like? Recent research on Knowledge Graph construction proposes the design of a façade, a notion borrowed from object-oriented software engineering. This idea is applied to SPARQL Anything, a system that allows querying heterogeneous resources as-if they were in RDF, in plain SPARQL 1.1, by overloading the SERVICE clause. SPARQL Anything supports a wide variety of file formats, from popular ones (CSV, JSON, XML, Spreadsheets) to others that are not supported by alternative solutions (Markdown, YAML, DOCx, Bibtex). Features include querying Web APIs with high flexibility, parametrised queries, and chaining multiple transformations into complex pipelines. In this paper, we describe the design rationale and software architecture of the SPARQL Anything system. We provide references to an extensive set of reusable, real-world scenarios from various application domains. We report on the value-to-users of the founding assumptions of its design, compared to alternative solutions through a community survey and a field report from the industry.
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
DOLCE: A Descriptive Ontology for Linguistic and Cognitive Engineering
Authors:
Stefano Borgo,
Roberta Ferrario,
Aldo Gangemi,
Nicola Guarino,
Claudio Masolo,
Daniele Porello,
Emilio M. Sanfilippo,
Laure Vieu
Abstract:
DOLCE, the first top-level (foundational) ontology to be axiomatized, has remained stable for twenty years and today is broadly used in a variety of domains. DOLCE is inspired by cognitive and linguistic considerations and aims to model a commonsense view of reality, like the one human beings exploit in everyday life in areas as diverse as socio-technical systems, manufacturing, financial transact…
▽ More
DOLCE, the first top-level (foundational) ontology to be axiomatized, has remained stable for twenty years and today is broadly used in a variety of domains. DOLCE is inspired by cognitive and linguistic considerations and aims to model a commonsense view of reality, like the one human beings exploit in everyday life in areas as diverse as socio-technical systems, manufacturing, financial transactions and cultural heritage. DOLCE clearly lists the ontological choices it is based upon, relies on philosophical principles, is richly formalized, and is built according to well-established ontological methodologies, e.g. OntoClean. Because of these features, it has inspired most of the existing top-level ontologies and has been used to develop or improve standards and public domain resources (e.g. CIDOC CRM, DBpedia and WordNet). Being a foundational ontology, DOLCE is not directly concerned with domain knowledge. Its purpose is to provide the general categories and relations needed to give a coherent view of reality, to integrate domain knowledge, and to mediate across domains. In these 20 years DOLCE has shown that applied ontologies can be stable and that interoperability across reference and domain ontologies is a reality. This paper briefly introduces the ontology and shows how to use it on a few modeling cases.
△ Less
Submitted 3 August, 2023;
originally announced August 2023.
-
The Music Note Ontology
Authors:
Andrea Poltronieri,
Aldo Gangemi
Abstract:
In this paper we propose the Music Note Ontology, an ontology for modelling music notes and their realisation. The ontology addresses the relation between a note represented in a symbolic representation system, and its realisation, i.e. a musical performance. This work therefore aims to solve the modelling and representation issues that arise when analysing the relationships between abstract symbo…
▽ More
In this paper we propose the Music Note Ontology, an ontology for modelling music notes and their realisation. The ontology addresses the relation between a note represented in a symbolic representation system, and its realisation, i.e. a musical performance. This work therefore aims to solve the modelling and representation issues that arise when analysing the relationships between abstract symbolic features and the corresponding physical features of an audio signal. The ontology is composed of three different Ontology Design Patterns (ODP), which model the structure of the score (Score Part Pattern), the note in the symbolic notation (Music Note Pattern) and its realisation (Musical Object Pattern).
△ Less
Submitted 30 March, 2023;
originally announced April 2023.
-
That's All Folks: a KG of Values as Commonsense Social Norms and Behaviors
Authors:
Stefano De Giorgis,
Aldo Gangemi
Abstract:
Values, as intended in ethics, determine the shape and validity of moral and social norms, grounding our everyday individual and community behavior on commonsense knowledge. Formalising latent moral content in human interaction is an appealing perspective that would enable a deeper understanding of both social dynamics and individual cognitive and behavioral dimension. To tackle this problem, seve…
▽ More
Values, as intended in ethics, determine the shape and validity of moral and social norms, grounding our everyday individual and community behavior on commonsense knowledge. Formalising latent moral content in human interaction is an appealing perspective that would enable a deeper understanding of both social dynamics and individual cognitive and behavioral dimension. To tackle this problem, several theoretical frameworks offer different values models, and organize them into different taxonomies. The problem of the most used theories is that they adopt a cultural-independent perspective while many entities that are considered "values" are grounded in commonsense knowledge and expressed in everyday life interaction. We propose here two ontological modules, FOLK, an ontology for values intended in their broad sense, and That's All Folks, a module for lexical and factual folk value triggers, whose purpose is to complement the main theories, providing a method for identifying the values that are not contemplated by the major value theories, but which nonetheless play a key role in daily human interactions, and shape social structures, cultural biases, and personal beliefs. The resource is tested via performing automatic detection of values from text with a frame-based approach.
△ Less
Submitted 1 March, 2023;
originally announced March 2023.
-
Towards a Privacy-Preserving Dispute Resolution Protocol on Ethereum
Authors:
Andrea Gangemi,
Aida Manzano Kharman
Abstract:
We present a new dispute resolution protocol that can be built on the Ethereum blockchain. Unlike existing applications like Kleros, privacy is ensured by design through the use of the zero-knowledge protocols Semaphore and MACI (Minimal Anti-Collusion Infrastructure), which provide, among other things, resistance to Sybil-like attacks and corruption. Differently from Kleros, dispute resolution is…
▽ More
We present a new dispute resolution protocol that can be built on the Ethereum blockchain. Unlike existing applications like Kleros, privacy is ensured by design through the use of the zero-knowledge protocols Semaphore and MACI (Minimal Anti-Collusion Infrastructure), which provide, among other things, resistance to Sybil-like attacks and corruption. Differently from Kleros, dispute resolution is guaranteed despite the users having the final say. Moreover, the proposed model does not use a native token on the platform, but aims to reward stakeholders through a social incentive mechanism based on soulbound tokens, introduced by Weyl, Ohlhaver, and Buterin in 2022. Users with these tokens will be considered trustworthy and will have the ability to govern the platform. As far as we know, this is one of the first blockchain projects that seeks to introduce social governance rather than one based on economic incentives.
△ Less
Submitted 24 November, 2023; v1 submitted 1 March, 2023;
originally announced March 2023.
-
Is Your Model Sensitive? SPeDaC: A New Benchmark for Detecting and Classifying Sensitive Personal Data
Authors:
Gaia Gambarelli,
Aldo Gangemi,
Rocco Tripodi
Abstract:
In recent years, there has been an exponential growth of applications, including dialogue systems, that handle sensitive personal information. This has brought to light the extremely important issue of personal data protection in virtual environments. Sensitive Information Detection (SID) approaches different domains and languages in literature. However, if we refer to the personal data domain, a…
▽ More
In recent years, there has been an exponential growth of applications, including dialogue systems, that handle sensitive personal information. This has brought to light the extremely important issue of personal data protection in virtual environments. Sensitive Information Detection (SID) approaches different domains and languages in literature. However, if we refer to the personal data domain, a shared benchmark or the absence of an available labeled resource makes comparison with the state-of-the-art difficult. We introduce and release SPeDaC , a new annotated resource for the identification of sensitive personal data categories in the English language. SPeDaC enables the evaluation of computational models for three different SID subtasks with increasing levels of complexity. SPeDaC 1 regards binary classification, a model has to detect if a sentence contains sensitive information or not; whereas, in SPeDaC 2 we collected labeled sentences using 5 categories that relate to macro-domains of personal information; in SPeDaC 3, the labeling is fine-grained (61 personal data categories). We conduct an extensive evaluation of the resource using different state-of-the-art-classifiers. The results show that SPeDaC is challenging, particularly with regard to fine-grained classification. The transformer models achieve the best results (acc. RoBERTa on SPeDaC 1 = 98.20%, DeBERTa on SPeDaC 2 = 95.81% and SPeDaC 3 = 77.63%).
△ Less
Submitted 21 December, 2022; v1 submitted 12 August, 2022;
originally announced August 2022.
-
Bitcoin: a new Proof-of-Work system with reduced variance
Authors:
Danilo Bazzanella,
Andrea Gangemi
Abstract:
Proof-of-Work (PoW) is a popular consensus protocol used by Bitcoin since its inception. PoW has the well-known flaw of assigning all the reward to the single miner (or pool) that inserts the new block. This has the consequence of making the variance of the reward and thus the mining enterprise risk extremely high. To address this problem, Shi in 2016 proposed a theoretical algorithm that would su…
▽ More
Proof-of-Work (PoW) is a popular consensus protocol used by Bitcoin since its inception. PoW has the well-known flaw of assigning all the reward to the single miner (or pool) that inserts the new block. This has the consequence of making the variance of the reward and thus the mining enterprise risk extremely high. To address this problem, Shi in 2016 proposed a theoretical algorithm that would substantially reduce the issue. We introduce a variant of Proof-of-Work that improves on Shi's idea and can be easily implemented in practice. In order to insert a block, the network must not find a single nonce, but must find a few of them. This small change allows for a fairer distribution of rewards and at the same time has the effect of regularizing the insertion time of blocks. This would facilitate the emergence of small pools or autonomous miners.
△ Less
Submitted 12 July, 2022;
originally announced July 2022.
-
Special subsets of addresses for blockchains using the secp256k1 curve
Authors:
Antonio J. Di Scala,
Andrea Gangemi,
Giuliano Romeo,
Gabriele Vernetti
Abstract:
In 2020 Sala, Sogiorno and Taufer have been able to find the private keys of some Bitcoin addresses, thus being able to spend the cryptocurrency linked to them. This result was unexpected, since the recovery of non-trivial private keys for blockchain addresses is deemed to be an infeasible problem. In this paper we widen this analysis by mounting a similar attack to other small subsets of the set…
▽ More
In 2020 Sala, Sogiorno and Taufer have been able to find the private keys of some Bitcoin addresses, thus being able to spend the cryptocurrency linked to them. This result was unexpected, since the recovery of non-trivial private keys for blockchain addresses is deemed to be an infeasible problem. In this paper we widen this analysis by mounting a similar attack to other small subsets of the set of private keys. We then apply it to other blockchains as well, examining Ethereum, Dogecoin, Litecoin, Dash, Zcash and Bitcoin Cash. In addition to the results, we also explain the techniques we have used to perform this exhaustive search for all the addresses that have ever appeared in these blockchains.
△ Less
Submitted 28 June, 2022;
originally announced June 2022.
-
Automatically Drafting Ontologies from Competency Questions with FrODO
Authors:
Aldo Gangemi,
Anna Sofia Lippolis,
Giorgia Lodi,
Andrea Giovanni Nuzzolese
Abstract:
We present the Frame-based ontology Design Outlet (FrODO), a novel method and tool for drafting ontologies from competency questions automatically. Competency questions are expressed as natural language and are a common solution for representing requirements in a number of agile ontology engineering methodologies, such as the eXtreme Design (XD) or SAMOD. FrODO builds on top of FRED. In fact, it l…
▽ More
We present the Frame-based ontology Design Outlet (FrODO), a novel method and tool for drafting ontologies from competency questions automatically. Competency questions are expressed as natural language and are a common solution for representing requirements in a number of agile ontology engineering methodologies, such as the eXtreme Design (XD) or SAMOD. FrODO builds on top of FRED. In fact, it leverages the frame semantics for drawing domain-relevant boundaries around the RDF produced by FRED from a competency question, thus drafting domain ontologies. We carried out a user-based study for assessing FrODO in supporting engineers for ontology design tasks. The study shows that FrODO is effective in this and the resulting ontology drafts are qualitative.
△ Less
Submitted 3 August, 2022; v1 submitted 6 June, 2022;
originally announced June 2022.
-
The HaMSE Ontology: Using Semantic Technologies to support Music Representation Interoperability and Musicological Analysis
Authors:
Andrea Poltronieri,
Aldo Gangemi
Abstract:
The use of Semantic Technologies - in particular the Semantic Web - has revealed to be a great tool for describing the cultural heritage domain and artistic practices. However, the panorama of ontologies for musicological applications seems to be limited and restricted to specific applications. In this research, we propose HaMSE, an ontology capable of describing musical features that can assist m…
▽ More
The use of Semantic Technologies - in particular the Semantic Web - has revealed to be a great tool for describing the cultural heritage domain and artistic practices. However, the panorama of ontologies for musicological applications seems to be limited and restricted to specific applications. In this research, we propose HaMSE, an ontology capable of describing musical features that can assist musicological research. More specifically, HaMSE proposes to address sues that have been affecting musicological research for decades: the representation of music and the relationship between quantitative and qualitative data. To do this, HaMSE allows the alignment between different music representation systems and describes a set of musicological features that can allow the music analysis at different granularity levels.
△ Less
Submitted 11 February, 2022;
originally announced February 2022.
-
A Knowledge Graph Embeddings based Approach for Author Name Disambiguation using Literals
Authors:
Cristian Santini,
Genet Asefa Gesese,
Silvio Peroni,
Aldo Gangemi,
Harald Sack,
Mehwish Alam
Abstract:
Scholarly data is growing continuously containing information about the articles from a plethora of venues including conferences, journals, etc. Many initiatives have been taken to make scholarly data available as Knowledge Graphs (KGs). These efforts to standardize these data and make them accessible have also led to many challenges such as exploration of scholarly articles, ambiguous authors, et…
▽ More
Scholarly data is growing continuously containing information about the articles from a plethora of venues including conferences, journals, etc. Many initiatives have been taken to make scholarly data available as Knowledge Graphs (KGs). These efforts to standardize these data and make them accessible have also led to many challenges such as exploration of scholarly articles, ambiguous authors, etc. This study more specifically targets the problem of Author Name Disambiguation (AND) on Scholarly KGs and presents a novel framework, Literally Author Name Disambiguation (LAND), which utilizes Knowledge Graph Embeddings (KGEs) using multimodal literal information generated from these KGs. This framework is based on three components: 1) Multimodal KGEs, 2) A blocking procedure, and finally, 3) Hierarchical Agglomerative Clustering. Extensive experiments have been conducted on two newly created KGs: (i) KG containing information from Scientometrics Journal from 1978 onwards (OC-782K), and (ii) a KG extracted from a well-known benchmark for AND provided by AMiner (AMiner-534K). The results show that our proposed architecture outperforms our baselines of 8-14% in terms of the F1 score and shows competitive performances on a challenging benchmark such as AMiner. The code and the datasets are publicly available through Github: https://github.com/sntcristian/and-kge and Zenodo:https://doi.org/10.5281/zenodo.6309855 respectively.
△ Less
Submitted 1 June, 2022; v1 submitted 24 January, 2022;
originally announced January 2022.
-
Marriage is a Peach and a Chalice: Modelling Cultural Symbolism on the SemanticWeb
Authors:
Bruno Sartini,
Marieke van Erp,
Aldo Gangemi
Abstract:
In this work, we fill the gap in the Semantic Web in the context of Cultural Symbolism. Building upon earlier work in, we introduce the Simulation Ontology, an ontology that models the background knowledge of symbolic meanings, developed by combining the concepts taken from the authoritative theory of Simulacra and Simulations of Jean Baudrillard with symbolic structures and content taken from "Sy…
▽ More
In this work, we fill the gap in the Semantic Web in the context of Cultural Symbolism. Building upon earlier work in, we introduce the Simulation Ontology, an ontology that models the background knowledge of symbolic meanings, developed by combining the concepts taken from the authoritative theory of Simulacra and Simulations of Jean Baudrillard with symbolic structures and content taken from "Symbolism: a Comprehensive Dictionary" by Steven Olderr. We re-engineered the symbolic knowledge already present in heterogeneous resources by converting it into our ontology schema to create HyperReal, the first knowledge graph completely dedicated to cultural symbolism. A first experiment run on the knowledge graph is presented to show the potential of quantitative research on symbolism.
△ Less
Submitted 3 November, 2021;
originally announced November 2021.
-
Graph-based Retrieval for Claim Verification over Cross-Document Evidence
Authors:
Misael Mongiovì,
Aldo Gangemi
Abstract:
Verifying the veracity of claims requires reasoning over a large knowledge base, often in the form of corpora of trustworthy sources. A common approach consists in retrieving short portions of relevant text from the reference documents and giving them as input to a natural language inference module that determines whether the claim can be inferred or contradicted from them. This approach, however,…
▽ More
Verifying the veracity of claims requires reasoning over a large knowledge base, often in the form of corpora of trustworthy sources. A common approach consists in retrieving short portions of relevant text from the reference documents and giving them as input to a natural language inference module that determines whether the claim can be inferred or contradicted from them. This approach, however, struggles when multiple pieces of evidence need to be collected and combined from different documents, since the single documents are often barely related to the target claim and hence they are left out by the retrieval module. We conjecture that a graph-based approach can be beneficial to identify fragmented evidence. We tested this hypothesis by building, over the whole corpus, a large graph that interconnects text portions by means of mentioned entities and exploiting such a graph for identifying candidate sets of evidence from multiple sources. Our experiments show that leveraging on a graph structure is beneficial in identifying a reasonably small portion of passages related to a claim.
△ Less
Submitted 13 September, 2021;
originally announced September 2021.
-
Facade-X: an opinionated approach to SPARQL anything
Authors:
Enrico Daga,
Luigi Asprino,
Paul Mulholland,
Aldo Gangemi
Abstract:
The Semantic Web research community understood since its beginning how crucial it is to equip practitioners with methods to transform non-RDF resources into RDF. Proposals focus on either engineering content transformations or accessing non-RDF resources with SPARQL. Existing solutions require users to learn specific mapping languages (e.g. RML), to know how to query and manipulate a variety of so…
▽ More
The Semantic Web research community understood since its beginning how crucial it is to equip practitioners with methods to transform non-RDF resources into RDF. Proposals focus on either engineering content transformations or accessing non-RDF resources with SPARQL. Existing solutions require users to learn specific mapping languages (e.g. RML), to know how to query and manipulate a variety of source formats (e.g. XPATH, JSON-Path), or to combine multiple languages (e.g. SPARQL Generate). In this paper, we explore an alternative solution and contribute a general-purpose meta-model for converting non-RDF resources into RDF: Facade-X. Our approach can be implemented by overriding the SERVICE operator and does not require to extend the SPARQL syntax. We compare our approach with the state of art methods RML and SPARQL Generate and show how our solution has lower learning demands and cognitive complexity, and it is cheaper to implement and maintain, while having comparable extensibility and efficiency.
△ Less
Submitted 4 June, 2021;
originally announced June 2021.
-
Interval Probabilistic Fuzzy WordNet
Authors:
Yousef Alizadeh-Q,
Behrouz Minaei-Bidgoli,
Sayyed-Ali Hossayni,
Mohammad-R Akbarzadeh-T,
Diego Reforgiato Recupero,
Mohammad-Reza Rajati,
Aldo Gangemi
Abstract:
WordNet lexical-database groups English words into sets of synonyms called "synsets." Synsets are utilized for several applications in the field of text-mining. However, they were also open to criticism because although, in reality, not all the members of a synset represent the meaning of that synset with the same degree, in practice, they are considered as members of the synset, identically. Thus…
▽ More
WordNet lexical-database groups English words into sets of synonyms called "synsets." Synsets are utilized for several applications in the field of text-mining. However, they were also open to criticism because although, in reality, not all the members of a synset represent the meaning of that synset with the same degree, in practice, they are considered as members of the synset, identically. Thus, the fuzzy version of synsets, called fuzzy-synsets (or fuzzy word-sense classes) were proposed and studied. In this study, we discuss why (type-1) fuzzy synsets (T1 F-synsets) do not properly model the membership uncertainty, and propose an upgraded version of fuzzy synsets in which membership degrees of word-senses are represented by intervals, similar to what in Interval Type 2 Fuzzy Sets (IT2 FS) and discuss that IT2 FS theoretical framework is insufficient for analysis and design of such synsets, and propose a new concept, called Interval Probabilistic Fuzzy (IPF) sets. Then we present an algorithm for constructing the IPF synsets in any language, given a corpus and a word-sense-disambiguation system. Utilizing our algorithm and the open-American-online-corpus (OANC) and UKB word-sense-disambiguation, we constructed and published the IPF synsets of WordNet for English language.
△ Less
Submitted 4 April, 2021;
originally announced April 2021.
-
An Ontology Design Pattern for representing Recurrent Situations
Authors:
Valentina Anita Carriero,
Aldo Gangemi,
Andrea Giovanni Nuzzolese,
Valentina Presutti
Abstract:
In this paper, we present an Ontology Design Pattern for representing situations that recur at regular periods and share some invariant factors, which unify them conceptually: we refer to this set of recurring situations as recurrent situation series. The proposed pattern appears to be foundational, since it can be generalised for modelling the top-level domain-independent concept of recurrence, w…
▽ More
In this paper, we present an Ontology Design Pattern for representing situations that recur at regular periods and share some invariant factors, which unify them conceptually: we refer to this set of recurring situations as recurrent situation series. The proposed pattern appears to be foundational, since it can be generalised for modelling the top-level domain-independent concept of recurrence, which is strictly associated with invariance. The pattern reuses other foundational patterns such as Collection, Description and Situation, Classification, Sequence. Indeed, a recurrent situation series is formalised as both a collection of situations occurring regularly over time and unified according to some properties that are common to all the members, and a situation itself, which provides a relational context to its members that satisfy a reference description. Besides including some exemplifying instances of this pattern, we show how it has been implemented and specialised to model recurrent cultural events and ceremonies in ArCo, the Knowledge Graph of Italian cultural heritage.
△ Less
Submitted 1 January, 2021;
originally announced January 2021.
-
The Landscape of Ontology Reuse Approaches
Authors:
Valentina Anita Carriero,
Marilena Daquino,
Aldo Gangemi,
Andrea Giovanni Nuzzolese,
Silvio Peroni,
Valentina Presutti,
Francesca Tomasi
Abstract:
Ontology reuse aims to foster interoperability and facilitate knowledge reuse. Several approaches are typically evaluated by ontology engineers when bootstrapping a new project. However, current practices are often motivated by subjective, case-by-case decisions, which hamper the definition of a recommended behaviour. In this chapter we argue that to date there are no effective solutions for suppo…
▽ More
Ontology reuse aims to foster interoperability and facilitate knowledge reuse. Several approaches are typically evaluated by ontology engineers when bootstrapping a new project. However, current practices are often motivated by subjective, case-by-case decisions, which hamper the definition of a recommended behaviour. In this chapter we argue that to date there are no effective solutions for supporting developers' decision-making process when deciding on an ontology reuse strategy. The objective is twofold: (i) to survey current approaches to ontology reuse, presenting motivations, strategies, benefits and limits, and (ii) to analyse two representative approaches and discuss their merits.
△ Less
Submitted 25 November, 2020;
originally announced November 2020.
-
An Algorithm for Fuzzification of WordNets, Supported by a Mathematical Proof
Authors:
Sayyed-Ali Hossayni,
Mohammad-R Akbarzadeh-T,
Diego Reforgiato Recupero,
Aldo Gangemi,
Esteve Del Acebo,
Josep Lluís de la Rosa i Esteva
Abstract:
WordNet-like Lexical Databases (WLDs) group English words into sets of synonyms called "synsets." Although the standard WLDs are being used in many successful Text-Mining applications, they have the limitation that word-senses are considered to represent the meaning associated to their corresponding synsets, to the same degree, which is not generally true. In order to overcome this limitation, sev…
▽ More
WordNet-like Lexical Databases (WLDs) group English words into sets of synonyms called "synsets." Although the standard WLDs are being used in many successful Text-Mining applications, they have the limitation that word-senses are considered to represent the meaning associated to their corresponding synsets, to the same degree, which is not generally true. In order to overcome this limitation, several fuzzy versions of synsets have been proposed. A common trait of these studies is that, to the best of our knowledge, they do not aim to produce fuzzified versions of the existing WLD's, but build new WLDs from scratch, which has limited the attention received from the Text-Mining community, many of whose resources and applications are based on the existing WLDs. In this study, we present an algorithm for constructing fuzzy versions of WLDs of any language, given a corpus of documents and a word-sense disambiguation (WSD) system for that language. Then, using the Open-American-National-Corpus and UKB WSD as algorithm inputs, we construct and publish online the fuzzified version of English WordNet (FWN). We also propose a theoretical (mathematical) proof of the validity of its results.
△ Less
Submitted 7 June, 2020;
originally announced June 2020.
-
Using altmetrics for detecting impactful research in quasi-zero-day time-windows: the case of COVID-19
Authors:
Erik Boetto,
Maria Pia Fantini,
Aldo Gangemi,
Davide Golinelli,
Manfredi Greco,
Andrea Giovanni Nuzzolese,
Valentina Presutti,
Flavia Rallo
Abstract:
On December 31st 2019, the World Health Organization (WHO) China Country Office was informed of cases of pneumonia of unknown etiology detected in Wuhan City. The cause of the syndrome was a new type of coronavirus isolated on January 7th 2020 and named Severe Acute Respiratory Syndrome CoronaVirus 2 (SARS-CoV-2). SARS-CoV-2 is the cause of the coronavirus disease 2019 (COVID-19). Since January 20…
▽ More
On December 31st 2019, the World Health Organization (WHO) China Country Office was informed of cases of pneumonia of unknown etiology detected in Wuhan City. The cause of the syndrome was a new type of coronavirus isolated on January 7th 2020 and named Severe Acute Respiratory Syndrome CoronaVirus 2 (SARS-CoV-2). SARS-CoV-2 is the cause of the coronavirus disease 2019 (COVID-19). Since January 2020 an ever increasing number of scientific works have appeared in literature. Identifying relevant research outcomes at very early stages is challenging. In this work we use COVID-19 as a use-case for investigating: (i) which tools and frameworks are mostly used for early scholarly communication; (ii) to what extent altmetrics can be used to identify potential impactful research in tight (i.e. quasi-zero-day) time-windows. A literature review with rigorous eligibility criteria is performed for gathering a sample composed of scientific papers about SARS-CoV-2/COVID-19 appeared in literature in the tight time-window ranging from January 15th 2020 to February 24th 2020. This sample is used for building a knowledge graph that represents the knowledge about papers and indicators formally. This knowledge graph feeds a data analysis process which is applied for experimenting with altmetrics as impact indicators. We find moderate correlation among traditional citation count, citations on social media, and mentions on news and blogs. This suggests there is a common intended meaning of the citational acts associated with aforementioned indicators. Additionally, we define a method that harmonises different indicators for providing a multi-dimensional impact indicator.
△ Less
Submitted 16 November, 2020; v1 submitted 13 April, 2020;
originally announced April 2020.
-
Pattern-based design applied to cultural heritage knowledge graphs
Authors:
Valentina Anita Carriero,
Aldo Gangemi,
Maria Letizia Mancinelli,
Andrea Giovanni Nuzzolese,
Valentina Presutti,
Chiara Veninata
Abstract:
Ontology Design Patterns (ODPs) have become an established and recognised practice for guaranteeing good quality ontology engineering. There are several ODP repositories where ODPs are shared as well as ontology design methodologies recommending their reuse. Performing rigorous testing is recommended as well for supporting ontology maintenance and validating the resulting resource against its moti…
▽ More
Ontology Design Patterns (ODPs) have become an established and recognised practice for guaranteeing good quality ontology engineering. There are several ODP repositories where ODPs are shared as well as ontology design methodologies recommending their reuse. Performing rigorous testing is recommended as well for supporting ontology maintenance and validating the resulting resource against its motivating requirements. Nevertheless, it is less than straightforward to find guidelines on how to apply such methodologies for developing domain-specific knowledge graphs. ArCo is the knowledge graph of Italian Cultural Heritage and has been developed by using eXtreme Design (XD), an ODP- and test-driven methodology. During its development, XD has been adapted to the need of the CH domain e.g. gathering requirements from an open, diverse community of consumers, a new ODP has been defined and many have been specialised to address specific CH requirements. This paper presents ArCo and describes how to apply XD to the development and validation of a CH knowledge graph, also detailing the (intellectual) process implemented for matching the encountered modelling problems to ODPs. Relevant contributions also include a novel web tool for supporting unit-testing of knowledge graphs, a rigorous evaluation of ArCo, and a discussion of methodological lessons learned during ArCo development.
△ Less
Submitted 20 June, 2020; v1 submitted 18 November, 2019;
originally announced November 2019.
-
ArCo: the Italian Cultural Heritage Knowledge Graph
Authors:
Valentina Anita Carriero,
Aldo Gangemi,
Maria Letizia Mancinelli,
Ludovica Marinucci,
Andrea Giovanni Nuzzolese,
Valentina Presutti,
Chiara Veninata
Abstract:
ArCo is the Italian Cultural Heritage knowledge graph, consisting of a network of seven vocabularies and 169 million triples about 820 thousand cultural entities. It is distributed jointly with a SPARQL endpoint, a software for converting catalogue records to RDF, and a rich suite of documentation material (testing, evaluation, how-to, examples, etc.). ArCo is based on the official General Catalog…
▽ More
ArCo is the Italian Cultural Heritage knowledge graph, consisting of a network of seven vocabularies and 169 million triples about 820 thousand cultural entities. It is distributed jointly with a SPARQL endpoint, a software for converting catalogue records to RDF, and a rich suite of documentation material (testing, evaluation, how-to, examples, etc.). ArCo is based on the official General Catalogue of the Italian Ministry of Cultural Heritage and Activities (MiBAC) - and its associated encoding regulations - which collects and validates the catalogue records of (ideally) all Italian Cultural Heritage properties (excluding libraries and archives), contributed by CH administrators from all over Italy. We present its structure, design methods and tools, its growing community, and delineate its importance, quality, and impact.
△ Less
Submitted 7 May, 2019;
originally announced May 2019.
-
Linked Open Data Validity -- A Technical Report from ISWS 2018
Authors:
Tayeb Abderrahmani Ghor,
Esha Agrawal,
Mehwish Alam,
Omar Alqawasmeh,
Claudia D'amato,
Amina Annane,
Amr Azzam,
Andrew Berezovskyi,
Russa Biswas,
Mathias Bonduel,
Quentin Brabant,
Cristina-iulia Bucur,
Elena Camossi,
Valentina Anita Carriero,
Shruthi Chari,
David Chaves Fraga,
Fiorela Ciroku,
Michael Cochez,
Hubert Curien,
Vincenzo Cutrona,
Rahma Dandan,
Danilo Dess,
Valerio Di Carlo,
Ahmed El Amine Djebri,
Marieke Van Erp
, et al. (46 additional authors not shown)
Abstract:
Linked Open Data (LOD) is the publicly available RDF data in the Web. Each LOD entity is identfied by a URI and accessible via HTTP. LOD encodes globalscale knowledge potentially available to any human as well as artificial intelligence that may want to benefit from it as background knowledge for supporting their tasks. LOD has emerged as the backbone of applications in diverse fields such as Natu…
▽ More
Linked Open Data (LOD) is the publicly available RDF data in the Web. Each LOD entity is identfied by a URI and accessible via HTTP. LOD encodes globalscale knowledge potentially available to any human as well as artificial intelligence that may want to benefit from it as background knowledge for supporting their tasks. LOD has emerged as the backbone of applications in diverse fields such as Natural Language Processing, Information Retrieval, Computer Vision, Speech Recognition, and many more. Nevertheless, regardless of the specific tasks that LOD-based tools aim to address, the reuse of such knowledge may be challenging for diverse reasons, e.g. semantic heterogeneity, provenance, and data quality. As aptly stated by Heath et al. Linked Data might be outdated, imprecise, or simply wrong": there arouses a necessity to investigate the problem of linked data validity. This work reports a collaborative effort performed by nine teams of students, guided by an equal number of senior researchers, attending the International Semantic Web Research School (ISWS 2018) towards addressing such investigation from different perspectives coupled with different approaches to tackle the issue.
△ Less
Submitted 26 March, 2019;
originally announced March 2019.
-
The practice of self-citations: a longitudinal study
Authors:
Silvio Peroni,
Paolo Ciancarini,
Aldo Gangemi,
Andrea Giovanni Nuzzolese,
Francesco Poggi,
Valentina Presutti
Abstract:
In this article, we discuss the outcomes of an experiment where we analysed whether and to what extent the introduction, in 2012, of the new research assessment exercise in Italy (a.k.a. Italian Scientific Habilitation) affected self-citation behaviours in the Italian research community. The Italian Scientific Habilitation attests to the scientific maturity of researchers and in Italy, as in many…
▽ More
In this article, we discuss the outcomes of an experiment where we analysed whether and to what extent the introduction, in 2012, of the new research assessment exercise in Italy (a.k.a. Italian Scientific Habilitation) affected self-citation behaviours in the Italian research community. The Italian Scientific Habilitation attests to the scientific maturity of researchers and in Italy, as in many other countries, is a requirement for accessing to a professorship. To this end, we obtained from ScienceDirect 35,673 articles published from 1957 and 2016 by the participants to the 2012 Italian Scientific Habilitation, that resulted in the extraction of 1,379,050 citations retrieved through Semantic Publishing technologies. Our analysis showed an overall increment in author self-citations (i.e. where the citing article and the cited article share at least one author) in several of the 24 academic disciplines considered. However, we depicted a stronger causal relation between such increment and the rules introduced by the 2012 Italian Scientific Habilitation in 10 out of 24 disciplines analysed.
△ Less
Submitted 19 February, 2020; v1 submitted 14 March, 2019;
originally announced March 2019.
-
Do altmetrics work for assessing research quality?
Authors:
Andrea Giovanni Nuzzolese,
Paolo Ciancarini,
Aldo Gangemi,
Silvio Peroni,
Francesco Poggi,
Valentina Presutti
Abstract:
Alternative metrics (aka altmetrics) are gaining increasing interest in the scientometrics community as they can capture both the volume and quality of attention that a research work receives online. Nevertheless, there is limited knowledge about their effectiveness as a mean for measuring the impact of research if compared to traditional citation-based indicators. This work aims at rigorously inv…
▽ More
Alternative metrics (aka altmetrics) are gaining increasing interest in the scientometrics community as they can capture both the volume and quality of attention that a research work receives online. Nevertheless, there is limited knowledge about their effectiveness as a mean for measuring the impact of research if compared to traditional citation-based indicators. This work aims at rigorously investigating if any correlation exists among indicators, either traditional (i.e. citation count and h-index) or alternative (i.e. altmetrics) and which of them may be effective for evaluating scholars. The study is based on the analysis of real data coming from the National Scientific Qualification procedure held in Italy by committees of peers on behalf of the Italian Ministry of Education, Universities and Research.
△ Less
Submitted 31 December, 2018;
originally announced December 2018.
-
Semantic Role Labeling for Knowledge Graph Extraction from Text
Authors:
Mehwish Alam,
Aldo Gangemi,
Valentina Presutti,
Diego Reforgiato Recupero
Abstract:
This paper introduces TakeFive, a new semantic role labeling method that transforms a text into a frame-oriented knowledge graph. It performs dependency parsing, identifies the words that evoke lexical frames, locates the roles and fillers for each frame, runs coercion techniques, and formalises the results as a knowledge graph. This formal representation complies with the frame semantics used in…
▽ More
This paper introduces TakeFive, a new semantic role labeling method that transforms a text into a frame-oriented knowledge graph. It performs dependency parsing, identifies the words that evoke lexical frames, locates the roles and fillers for each frame, runs coercion techniques, and formalises the results as a knowledge graph. This formal representation complies with the frame semantics used in Framester, a factual-linguistic linked data resource. The obtained precision, recall and F1 values indicate that TakeFive is competitive with other existing methods such as SEMAFOR, Pikes, PathLSTM and FRED. We finally discuss how to combine TakeFive and FRED, obtaining higher values of precision, recall and F1.
△ Less
Submitted 4 November, 2018;
originally announced November 2018.
-
Amnestic Forgery: an Ontology of Conceptual Metaphors
Authors:
Aldo Gangemi,
Mehwish Alam,
Valentina Presutti
Abstract:
This paper presents Amnestic Forgery, an ontology for metaphor semantics, based on MetaNet, which is inspired by the theory of Conceptual Metaphor. Amnestic Forgery reuses and extends the Framester schema, as an ideal ontology design framework to deal with both semiotic and referential aspects of frames, roles, mappings, and eventually blending. The description of the resource is supplied by a dis…
▽ More
This paper presents Amnestic Forgery, an ontology for metaphor semantics, based on MetaNet, which is inspired by the theory of Conceptual Metaphor. Amnestic Forgery reuses and extends the Framester schema, as an ideal ontology design framework to deal with both semiotic and referential aspects of frames, roles, mappings, and eventually blending. The description of the resource is supplied by a discussion of its applications, with examples taken from metaphor generation, and the referential problems of metaphoric mappings. Both schema and data are available from the Framester SPARQL endpoint.
△ Less
Submitted 30 May, 2018;
originally announced May 2018.
-
An Innovative, Open, Interoperable Citizen Engagement Cloud Platform for Smart Government and Users' Interaction
Authors:
Diego Reforgiato Recupero,
Mario Castronovo,
Sergio Consoli,
Tarcisio Costanzo,
Aldo Gangemi,
Luigi Grasso,
Giorgia Lodi,
Gianluca Merendino,
Misael Mongiovì,
Valentina Presutti,
Salvatore Davide Rapisarda,
Salvo Rosa,
Emanuele Spampinato
Abstract:
This paper introduces an open, interoperable, and cloud-computing-based citizen engagement platform for the management of administrative processes of public administrations, which also increases the engagement of citizens. The citizen engagement platform is the outcome of a 3-year Italian national project called PRISMA (Interoperable cloud platforms for smart government). The aim of the project is…
▽ More
This paper introduces an open, interoperable, and cloud-computing-based citizen engagement platform for the management of administrative processes of public administrations, which also increases the engagement of citizens. The citizen engagement platform is the outcome of a 3-year Italian national project called PRISMA (Interoperable cloud platforms for smart government). The aim of the project is to constitute a new model of digital ecosystem that can support and enable new methods of interaction among public administrations, citizens, companies, and other stakeholders surrounding cities. The platform has been defined by the media as a flexible (enable the addition of any kind of application or service) and open (enable access to open services) Italian "cloud" that allows public administrations to access to a vast knowledge base represented as linked open data to be reused by a stakeholder community with the aim of developing new applications ("Cloud Apps") tailored to the specific needs of citizens. The platform has been used by Catania and Syracuse municipalities, two of the main cities of southern Italy, located in the Sicilian region. The fully adoption of the platform is rapidly spreading around the whole region (local developers have already used available application programming interfaces (APIs) to create additional services for citizens and administrations) to such an extent that other provinces of Sicily and Italy in general expressed their interest for its usage. The platform is available online and, as mentioned above, is open source and provides APIs for full exploitation.
△ Less
Submitted 24 May, 2016;
originally announced May 2016.
-
Conceptual Analysis of Lexical Taxonomies: The Case of WordNet Top-Level
Authors:
Aldo Gangemi,
Nicola Guarino,
Alessandro Oltramari
Abstract:
In this paper we propose an analysis and an upgrade of WordNet's top-level synset taxonomy. We briefly review WordNet and identify its main semantic limitations. Some principles from a forthcoming OntoClean methodology are applied to the ontological analysis of WordNet. A revised top-level taxonomy is proposed, which is meant to be more conceptually rigorous, cognitively transparent, and efficie…
▽ More
In this paper we propose an analysis and an upgrade of WordNet's top-level synset taxonomy. We briefly review WordNet and identify its main semantic limitations. Some principles from a forthcoming OntoClean methodology are applied to the ontological analysis of WordNet. A revised top-level taxonomy is proposed, which is meant to be more conceptually rigorous, cognitively transparent, and efficiently exploitable in several applications.
△ Less
Submitted 11 September, 2001;
originally announced September 2001.