-
Mining Frequent Structures in Conceptual Models
Authors:
Mattia Fumagalli,
Tiago Prince Sales,
Pedro Paulo F. Barcelos,
Giovanni Micale,
Philipp-Lorenz Glaser,
Dominik Bork,
Vadim Zaytsev,
Diego Calvanese,
Giancarlo Guizzardi
Abstract:
The problem of using structured methods to represent knowledge is well-known in conceptual modeling and has been studied for many years. It has been proven that adopting modeling patterns represents an effective structural method. Patterns are, indeed, generalizable recurrent structures that can be exploited as solutions to design problems. They aid in understanding and improving the process of cr…
▽ More
The problem of using structured methods to represent knowledge is well-known in conceptual modeling and has been studied for many years. It has been proven that adopting modeling patterns represents an effective structural method. Patterns are, indeed, generalizable recurrent structures that can be exploited as solutions to design problems. They aid in understanding and improving the process of creating models. The undeniable value of using patterns in conceptual modeling was demonstrated in several experimental studies. However, discovering patterns in conceptual models is widely recognized as a highly complex task and a systematic solution to pattern identification is currently lacking. In this paper, we propose a general approach to the problem of discovering frequent structures, as they occur in conceptual modeling languages. As proof of concept, we implement our approach by focusing on two widely-used conceptual modeling languages. This implementation includes an exploratory tool that integrates a frequent subgraph mining algorithm with graph manipulation techniques. The tool processes multiple conceptual models and identifies recurrent structures based on various criteria. We validate the tool using two state-of-the-art curated datasets: one consisting of models encoded in OntoUML and the other in ArchiMate. The primary objective of our approach is to provide a support tool for language engineers. This tool can be used to identify both effective and ineffective modeling practices, enabling the refinement and evolution of conceptual modeling languages. Furthermore, it facilitates the reuse of accumulated expertise, ultimately supporting the creation of higher-quality models in a given language.
△ Less
Submitted 25 December, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
Combining Reward and Rank Signals for Slate Recommendation
Authors:
Imad Aouali,
Sergey Ivanov,
Mike Gartrell,
David Rohde,
Flavian Vasile,
Victor Zaytsev,
Diego Legrand
Abstract:
We consider the problem of slate recommendation, where the recommender system presents a user with a collection or slate composed of K recommended items at once. If the user finds the recommended items appealing then the user may click and the recommender system receives some feedback. Two pieces of information are available to the recommender system: was the slate clicked? (the reward), and if th…
▽ More
We consider the problem of slate recommendation, where the recommender system presents a user with a collection or slate composed of K recommended items at once. If the user finds the recommended items appealing then the user may click and the recommender system receives some feedback. Two pieces of information are available to the recommender system: was the slate clicked? (the reward), and if the slate was clicked, which item was clicked? (rank). In this paper, we formulate several Bayesian models that incorporate the reward signal (Reward model), the rank signal (Rank model), or both (Full model), for non-personalized slate recommendation. In our experiments, we analyze performance gains of the Full model and show that it achieves significantly lower error as the number of products in the catalog grows or as the slate size increases.
△ Less
Submitted 29 July, 2021; v1 submitted 26 July, 2021;
originally announced July 2021.
-
Does Python Smell Like Java? Tool Support for Design Defect Discovery in Python
Authors:
Nicole Vavrová,
Vadim Zaytsev
Abstract:
The context of this work is specification, detection and ultimately removal of detectable harmful patterns in source code that are associated with defects in design and implementation of software. In particular, we investigate five code smells and four antipatterns previously defined in papers and books. Our inquiry is about detecting those in source code written in Python programming language, wh…
▽ More
The context of this work is specification, detection and ultimately removal of detectable harmful patterns in source code that are associated with defects in design and implementation of software. In particular, we investigate five code smells and four antipatterns previously defined in papers and books. Our inquiry is about detecting those in source code written in Python programming language, which is substantially different from all prior research, most of which concerns Java or C-like languages. Our approach was that of software engineers: we have processed existing research literature on the topic, extracted both the abstract definitions of nine design defects and their concrete implementation specifications, implemented them all in a tool we have programmed and let it loose on a huge test set obtained from open source code from thousands of GitHub projects. When it comes to knowledge, we have found that more than twice as many methods in Python can be considered too long (statistically extremely longer than their neighbours within the same project) than in Java, but long parameter lists are seven times less likely to be found in Python code than in Java code. We have also found that Functional Decomposition, the way it was defined for Java, is not found in the Python code at all, and Spaghetti Code and God Classes are extremely rare there as well. The grounding and the confidence in these results comes from the fact that we have performed our experiments on 32'058'823 lines of Python code, which is by far the largest test set for a freely available Python parser. We have also designed the experiment in such a way that it aligned with prior research on design defect detection in Java in order to ease the comparison if we treat our own actions as a replication. Thus, the importance of the work is both in the unique open Python grammar of highest quality, tested on millions of lines of code, and in the design defect detection tool which works on something else than Java.
△ Less
Submitted 31 March, 2017;
originally announced March 2017.
-
Guided Grammar Convergence
Authors:
Vadim Zaytsev
Abstract:
Relating formal grammars is a hard problem that balances between language equivalence (which is known to be undecidable) and grammar identity (which is trivial). In this paper, we investigate several milestones between those two extremes and propose a methodology for inconsistency management in grammar engineering. While conventional grammar convergence is a practical approach relying on human exp…
▽ More
Relating formal grammars is a hard problem that balances between language equivalence (which is known to be undecidable) and grammar identity (which is trivial). In this paper, we investigate several milestones between those two extremes and propose a methodology for inconsistency management in grammar engineering. While conventional grammar convergence is a practical approach relying on human experts to encode differences as transformation steps, guided grammar convergence is a more narrowly applicable technique that infers such transformation steps automatically by normalising the grammars and establishing a structural equivalence relation between them. This allows us to perform a case study with automatically inferring bidirectional transformations between 11 grammars (in a broad sense) of the same artificial functional language: parser specifications with different combinator libraries, definite clause grammars, concrete syntax definitions, algebraic data types, metamodels, XML schemata, object models.
△ Less
Submitted 29 March, 2015;
originally announced March 2015.
-
Generating Conceptual Metaphors from Proposition Stores
Authors:
Ekaterina Ovchinnikova,
Vladimir Zaytsev,
Suzanne Wertheim,
Ross Israel
Abstract:
Contemporary research on computational processing of linguistic metaphors is divided into two main branches: metaphor recognition and metaphor interpretation. We take a different line of research and present an automated method for generating conceptual metaphors from linguistic data. Given the generated conceptual metaphors, we find corresponding linguistic metaphors in corpora. In this paper, we…
▽ More
Contemporary research on computational processing of linguistic metaphors is divided into two main branches: metaphor recognition and metaphor interpretation. We take a different line of research and present an automated method for generating conceptual metaphors from linguistic data. Given the generated conceptual metaphors, we find corresponding linguistic metaphors in corpora. In this paper, we describe our approach and its evaluation using English and Russian data.
△ Less
Submitted 25 September, 2014;
originally announced September 2014.
-
The Grammar Hammer of 2012
Authors:
Vadim Zaytsev
Abstract:
This document is a case study in aggressive self-archiving. It collects all initiatives undertaken by its author in 2012, including unpublished ones, explains their relevance and relation with one another. Discussed topics include guided convergence of formal grammars in a broad sense, programmable grammar transformation operator suites, metasyntactic specifications and methods of their manipulati…
▽ More
This document is a case study in aggressive self-archiving. It collects all initiatives undertaken by its author in 2012, including unpublished ones, explains their relevance and relation with one another. Discussed topics include guided convergence of formal grammars in a broad sense, programmable grammar transformation operator suites, metasyntactic specifications and methods of their manipulation, tolerant (soft computing) methods in parsing theory, megamodelling as modelling linguistic architecture of software systems, repositories of grammatical knowledge, open notebook computer science, as well as the number of minor topics (new parsing algorithms, visualisation techniques, etc). A brief overview of involved venues is also included in the report.
△ Less
Submitted 17 December, 2012;
originally announced December 2012.
-
Guided Grammar Convergence. Full Case Study Report. Generated by converge::Guided
Authors:
Vadim Zaytsev
Abstract:
This report is meant to be used as auxiliary material for the guided grammar convergence technique proposed earlier as problem-specific improvement in the topic of convergence of grammars. It contains a narrated MegaL megamodel, as well as full results of the guided grammar convergence experiment on the Factorial Language, with details about each grammar source packaged in a readable form. All for…
▽ More
This report is meant to be used as auxiliary material for the guided grammar convergence technique proposed earlier as problem-specific improvement in the topic of convergence of grammars. It contains a narrated MegaL megamodel, as well as full results of the guided grammar convergence experiment on the Factorial Language, with details about each grammar source packaged in a readable form. All formulae used within this document, are generated automatically by the convergence infrastructure in order to avoid any mistakes. The generator source code and the source of the introduction text can be found publicly available in the Software Language Processing Suite repository.
△ Less
Submitted 27 July, 2012;
originally announced July 2012.
-
MediaWiki Grammar Recovery
Authors:
Vadim Zaytsev
Abstract:
The paper describes in detail the recovery effort of one of the official MediaWiki grammars. Over two hundred grammar transformation steps are reported and annotated, leading to delivery of a level 2 grammar, semi-automatically extracted from a community created semi-formal text using at least five different syntactic notations, several non-enforced naming conventions, multiple misspellings, obsol…
▽ More
The paper describes in detail the recovery effort of one of the official MediaWiki grammars. Over two hundred grammar transformation steps are reported and annotated, leading to delivery of a level 2 grammar, semi-automatically extracted from a community created semi-formal text using at least five different syntactic notations, several non-enforced naming conventions, multiple misspellings, obsolete parsing technology idiosyncrasies and other problems commonly encountered in grammars that were not engineered properly. Having a quality grammar will allow to test and validate it further, without alienating the community with a separately developed grammar.
△ Less
Submitted 23 July, 2011;
originally announced July 2011.
-
Recovering Grammar Relationships for the Java Language Specification
Authors:
Ralf Lämmel,
Vadim Zaytsev
Abstract:
Grammar convergence is a method that helps discovering relationships between different grammars of the same language or different language versions. The key element of the method is the operational, transformation-based representation of those relationships. Given input grammars for convergence, they are transformed until they are structurally equal. The transformations are composed from primitive…
▽ More
Grammar convergence is a method that helps discovering relationships between different grammars of the same language or different language versions. The key element of the method is the operational, transformation-based representation of those relationships. Given input grammars for convergence, they are transformed until they are structurally equal. The transformations are composed from primitive operators; properties of these operators and the composed chains provide quantitative and qualitative insight into the relationships between the grammars at hand. We describe a refined method for grammar convergence, and we use it in a major study, where we recover the relationships between all the grammars that occur in the different versions of the Java Language Specification (JLS). The relationships are represented as grammar transformation chains that capture all accidental or intended differences between the JLS grammars. This method is mechanized and driven by nominal and structural differences between pairs of grammars that are subject to asymmetric, binary convergence steps. We present the underlying operator suite for grammar transformation in detail, and we illustrate the suite with many examples of transformations on the JLS grammars. We also describe the extraction effort, which was needed to make the JLS grammars amenable to automated processing. We include substantial metadata about the convergence process for the JLS so that the effort becomes reproducible and transparent.
△ Less
Submitted 24 August, 2010;
originally announced August 2010.