-
Article Classification with Graph Neural Networks and Multigraphs
Authors:
Khang Ly,
Yury Kashnitsky,
Savvas Chamezopoulos,
Valeria Krzhizhanovskaya
Abstract:
Classifying research output into context-specific label taxonomies is a challenging and relevant downstream task, given the volume of existing and newly published articles. We propose a method to enhance the performance of article classification by enriching simple Graph Neural Network (GNN) pipelines with multi-graph representations that simultaneously encode multiple signals of article relatedne…
▽ More
Classifying research output into context-specific label taxonomies is a challenging and relevant downstream task, given the volume of existing and newly published articles. We propose a method to enhance the performance of article classification by enriching simple Graph Neural Network (GNN) pipelines with multi-graph representations that simultaneously encode multiple signals of article relatedness, e.g. references, co-authorship, shared publication source, shared subject headings, as distinct edge types. Fully supervised transductive node classification experiments are conducted on the Open Graph Benchmark OGBN-arXiv dataset and the PubMed diabetes dataset, augmented with additional metadata from Microsoft Academic Graph and PubMed Central, respectively. The results demonstrate that multi-graphs consistently improve the performance of a variety of GNN models compared to the default graphs. When deployed with SOTA textual node embedding methods, the transformed multi-graphs enable simple and shallow 2-layer GNN pipelines to achieve results on par with more complex architectures.
△ Less
Submitted 28 May, 2024; v1 submitted 20 September, 2023;
originally announced September 2023.
-
Evaluating approaches to identifying research supporting the United Nations Sustainable Development Goals
Authors:
Yury Kashnitsky,
Guillaume Roberge,
Jingwen Mu,
Kevin Kang,
Weiwei Wang,
Maurice Vanderfeesten,
Maxim Rivest,
Savvas Chamezopoulos,
Robert Jaworek,
MaƩva Vignes,
Bamini Jayabalasingham,
Finne Boonen,
Chris James,
Marius Doornenbal,
Isabelle Labrosse
Abstract:
The United Nations (UN) Sustainable Development Goals (SDGs) challenge the global community to build a world where no one is left behind. Recognizing that research plays a fundamental part in supporting these goals, attempts have been made to classify research publications according to their relevance in supporting each of the UN's SDGs. In this paper, we outline the methodology that we followed w…
▽ More
The United Nations (UN) Sustainable Development Goals (SDGs) challenge the global community to build a world where no one is left behind. Recognizing that research plays a fundamental part in supporting these goals, attempts have been made to classify research publications according to their relevance in supporting each of the UN's SDGs. In this paper, we outline the methodology that we followed when mapping research articles to SDGs and which is adopted by Times Higher Education in their Social Impact rankings. We compare our solution with other existing queries and models mapping research papers to SDGs. We also discuss various aspects in which the methodology can be improved and generalized to other types of content apart from research articles. The results presented in this paper are the outcome of the SDG Research Mapping Initiative that was established as a partnership between the University of Southern Denmark, the Aurora European Universities Alliance (represented by Vrije Universiteit Amsterdam), the University of Auckland, and Elsevier to bring together broad expertise and share best practices on identifying research contributions to UN's Sustainable Development Goals.
△ Less
Submitted 1 December, 2023; v1 submitted 15 September, 2022;
originally announced September 2022.
-
How near-duplicate detection improves editors' and authors' publishing experience
Authors:
Yury Kashnitsky,
Vaishnavi Kandala,
Egbert van Wezenbeek,
IJsbrand Jan Aalbersberg,
Catriona Fennell,
Georgios Tsatsaronis
Abstract:
We describe a system that helps identify manuscripts submitted to multiple journals at the same time. Also, we discuss potential applications of the near-duplicate detection technology when run with manuscript text content, including identification of simultaneous submissions, prevention of duplicate published articles, and improving article transfer service.
We describe a system that helps identify manuscripts submitted to multiple journals at the same time. Also, we discuss potential applications of the near-duplicate detection technology when run with manuscript text content, including identification of simultaneous submissions, prevention of duplicate published articles, and improving article transfer service.
△ Less
Submitted 10 August, 2021;
originally announced August 2021.
-
Resolving Gendered Ambiguous Pronouns with BERT
Authors:
Matei Ionita,
Yury Kashnitsky,
Ken Krige,
Vladimir Larin,
Denis Logvinenko,
Atanas Atanasov
Abstract:
Pronoun resolution is part of coreference resolution, the task of pairing an expression to its referring entity. This is an important task for natural language understanding and a necessary component of machine translation systems, chat bots and assistants. Neural machine learning systems perform far from ideally in this task, reaching as low as 73% F1 scores on modern benchmark datasets. Moreover…
▽ More
Pronoun resolution is part of coreference resolution, the task of pairing an expression to its referring entity. This is an important task for natural language understanding and a necessary component of machine translation systems, chat bots and assistants. Neural machine learning systems perform far from ideally in this task, reaching as low as 73% F1 scores on modern benchmark datasets. Moreover, they tend to perform better for masculine pronouns than for feminine ones. Thus, the problem is both challenging and important for NLP researchers and practitioners. In this project, we describe our BERT-based approach to solving the problem of gender-balanced pronoun resolution. We are able to reach 92% F1 score and a much lower gender bias on the benchmark dataset shared by Google AI Language team.
△ Less
Submitted 13 June, 2019; v1 submitted 3 June, 2019;
originally announced June 2019.
-
Can FCA-based Recommender System Suggest a Proper Classifier?
Authors:
Yury Kashnitsky,
Dmitry I. Ignatov
Abstract:
The paper briefly introduces multiple classifier systems and describes a new algorithm, which improves classification accuracy by means of recommendation of a proper algorithm to an object classification. This recommendation is done assuming that a classifier is likely to predict the label of the object correctly if it has correctly classified its neighbors. The process of assigning a classifier t…
▽ More
The paper briefly introduces multiple classifier systems and describes a new algorithm, which improves classification accuracy by means of recommendation of a proper algorithm to an object classification. This recommendation is done assuming that a classifier is likely to predict the label of the object correctly if it has correctly classified its neighbors. The process of assigning a classifier to each object is based on Formal Concept Analysis. We explain the idea of the algorithm with a toy example and describe our first experiments with real-world datasets.
△ Less
Submitted 21 April, 2015;
originally announced April 2015.
-
Visual analytics in FCA-based clustering
Authors:
Yury Kashnitsky
Abstract:
Visual analytics is a subdomain of data analysis which combines both human and machine analytical abilities and is applied mostly in decision-making and data mining tasks. Triclustering, based on Formal Concept Analysis (FCA), was developed to detect groups of objects with similar properties under similar conditions. It is used in Social Network Analysis (SNA) and is a basis for certain types of r…
▽ More
Visual analytics is a subdomain of data analysis which combines both human and machine analytical abilities and is applied mostly in decision-making and data mining tasks. Triclustering, based on Formal Concept Analysis (FCA), was developed to detect groups of objects with similar properties under similar conditions. It is used in Social Network Analysis (SNA) and is a basis for certain types of recommender systems. The problem of triclustering algorithms is that they do not always produce meaningful clusters. This article describes a specific triclustering algorithm and a prototype of a visual analytics platform for working with obtained clusters. This tool is designed as a testing frameworkis and is intended to help an analyst to grasp the results of triclustering and recommender algorithms, and to make decisions on meaningfulness of certain triclusters and recommendations.
△ Less
Submitted 21 April, 2015;
originally announced April 2015.
-
Graphlet-based lazy associative graph classification
Authors:
Yury Kashnitsky,
Sergei O. Kuznetsov
Abstract:
The paper addresses the graph classification problem and introduces a modification of the lazy associative classification method to efficiently handle intersections of graphs. Graph intersections are approximated with all common subgraphs up to a fixed size similarly to what is done with graphlet kernels. We explain the idea of the algorithm with a toy example and describe our experiments with a p…
▽ More
The paper addresses the graph classification problem and introduces a modification of the lazy associative classification method to efficiently handle intersections of graphs. Graph intersections are approximated with all common subgraphs up to a fixed size similarly to what is done with graphlet kernels. We explain the idea of the algorithm with a toy example and describe our experiments with a predictive toxicology dataset.
△ Less
Submitted 13 May, 2015; v1 submitted 21 April, 2015;
originally announced April 2015.