-
Utilizing citation index and synthetic quality measure to compare Wikipedia languages across various topics
Authors:
Włodzimierz Lewoniewski,
Krzysztof Węcel,
Witold Abramowicz
Abstract:
This study presents a comparative analysis of 55 Wikipedia language editions employing a citation index alongside a synthetic quality measure. Specifically, we identified the most significant Wikipedia articles within distinct topical areas, selecting the top 10, top 25, and top 100 most cited articles in each topic and language version. This index was built on the basis of wikilinks between Wikip…
▽ More
This study presents a comparative analysis of 55 Wikipedia language editions employing a citation index alongside a synthetic quality measure. Specifically, we identified the most significant Wikipedia articles within distinct topical areas, selecting the top 10, top 25, and top 100 most cited articles in each topic and language version. This index was built on the basis of wikilinks between Wikipedia articles in each language version and in order to do that we processed 6.6 billion page-to-page link records. Next, we used a quality score for each Wikipedia article - a synthetic measure scaled from 0 to 100. This approach enabled quality comparison of Wikipedia articles even between language versions with different quality grading schemes. Our results highlight disparities among Wikipedia language editions, revealing strengths and gaps in content coverage and quality across topics.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
OpenFact at CheckThat! 2024: Combining Multiple Attack Methods for Effective Adversarial Text Generation
Authors:
Włodzimierz Lewoniewski,
Piotr Stolarski,
Milena Stróżyna,
Elzbieta Lewańska,
Aleksandra Wojewoda,
Ewelina Księżniak,
Marcin Sawiński
Abstract:
This paper presents the experiments and results for the CheckThat! Lab at CLEF 2024 Task 6: Robustness of Credibility Assessment with Adversarial Examples (InCrediblAE). The primary objective of this task was to generate adversarial examples in five problem domains in order to evaluate the robustness of widely used text classification methods (fine-tuned BERT, BiLSTM, and RoBERTa) when applied to…
▽ More
This paper presents the experiments and results for the CheckThat! Lab at CLEF 2024 Task 6: Robustness of Credibility Assessment with Adversarial Examples (InCrediblAE). The primary objective of this task was to generate adversarial examples in five problem domains in order to evaluate the robustness of widely used text classification methods (fine-tuned BERT, BiLSTM, and RoBERTa) when applied to credibility assessment issues.
This study explores the application of ensemble learning to enhance adversarial attacks on natural language processing (NLP) models. We systematically tested and refined several adversarial attack methods, including BERT-Attack, Genetic algorithms, TextFooler, and CLARE, on five datasets across various misinformation tasks. By developing modified versions of BERT-Attack and hybrid methods, we achieved significant improvements in attack effectiveness. Our results demonstrate the potential of modification and combining multiple methods to create more sophisticated and effective adversarial attack strategies, contributing to the development of more robust and secure systems.
△ Less
Submitted 5 September, 2024; v1 submitted 4 September, 2024;
originally announced September 2024.
-
Measuring Americanization: A Global Quantitative Study of Interest in American Topics on Wikipedia
Authors:
Piotr Konieczny,
Włodzimierz Lewoniewski
Abstract:
We conducted a global comparative analysis of the coverage of American topics in different language versions of Wikipedia, using over 90 million Wikidata items and 40 million Wikipedia articles in 58 languages. Our study aimed to investigate whether Americanization is more or less dominant in different regions and cultures and to determine whether interest in American topics is universal.
We conducted a global comparative analysis of the coverage of American topics in different language versions of Wikipedia, using over 90 million Wikidata items and 40 million Wikipedia articles in 58 languages. Our study aimed to investigate whether Americanization is more or less dominant in different regions and cultures and to determine whether interest in American topics is universal.
△ Less
Submitted 26 July, 2023;
originally announced July 2023.
-
Reliability in Time: Evaluating the Web Sources of Information on COVID-19 in Wikipedia across Various Language Editions from the Beginning of the Pandemic
Authors:
Włodzimierz Lewoniewski,
Krzysztof Węcel,
Witold Abramowicz
Abstract:
There are over a billion websites on the Internet that can potentially serve as sources of information on various topics. One of the most popular examples of such an online source is Wikipedia. This public knowledge base is co-edited by millions of users from all over the world. Information in each language version of Wikipedia can be created and edited independently. Therefore, we can observe cer…
▽ More
There are over a billion websites on the Internet that can potentially serve as sources of information on various topics. One of the most popular examples of such an online source is Wikipedia. This public knowledge base is co-edited by millions of users from all over the world. Information in each language version of Wikipedia can be created and edited independently. Therefore, we can observe certain inconsistencies in the statements and facts described therein - depending on language and topic. In accordance with the Wikipedia content authoring guidelines, information in Wikipedia articles should be based on reliable, published sources. So, based on data from such a collaboratively edited encyclopedia, we should also be able to find important sources on specific topics. This effect can be potentially useful for people and organizations.
The reliability of a source in Wikipedia articles depends on the context. So the same source (website) may have various degrees of reliability in Wikipedia depending on topic and language version. Moreover, reliability of the same source can change over the time. The purpose of this study is to identify reliable sources on a specific topic - the COVID-19 pandemic. Such an analysis was carried out on real data from Wikipedia within selected language versions and within a selected time period.
△ Less
Submitted 29 April, 2022;
originally announced April 2022.
-
Novel version of PageRank, CheiRank and 2DRank for Wikipedia in Multilingual Network using Social Impact
Authors:
Célestin Coquidé,
Włodzimierz Lewoniewski
Abstract:
Nowadays, information describing navigation behaviour of internet users are used in several fields, e-commerce, economy, sociology and data science. Such information can be extracted from different knowledge bases, including business-oriented ones. In this paper, we propose a new model for the PageRank, CheiRank and 2DRank algorithm based on the use of clickstream and pageviews data in the google…
▽ More
Nowadays, information describing navigation behaviour of internet users are used in several fields, e-commerce, economy, sociology and data science. Such information can be extracted from different knowledge bases, including business-oriented ones. In this paper, we propose a new model for the PageRank, CheiRank and 2DRank algorithm based on the use of clickstream and pageviews data in the google matrix construction. We used data from Wikipedia and analysed links between over 20 million articles from 11 language editions. We extracted over 1.4 billion source-destination pairs of articles from SQL dumps and more than 700 million pairs from XML dumps. Additionally, we unified the pairs based on the analysis of redirect pages and removed all duplicates. Moreover, we also created a bigger network of Wikipedia articles based on all considered language versions and obtained multilingual measures. Based on real data, we discussed the difference between standard PageRank, Cheirank, 2DRank and measures obtained based on our approach in separate languages and multilingual network of Wikipedia.
△ Less
Submitted 17 August, 2020; v1 submitted 9 March, 2020;
originally announced March 2020.
-
The didactic potential of virtual information educational environment as a tool of geography students training
Authors:
Olga Bondarenko,
Olena Pakhomova,
Wlodzimierz Lewoniewski
Abstract:
The article clarifies the concept of "virtual information educational environment" (VIEE) and examines the researchers' views on its meaning exposed in the scientific literature. The article determines the didactic potential of the virtual information educational environment for the geography students training based on the analysis of the authors' experience of blended learning by means of the Goo…
▽ More
The article clarifies the concept of "virtual information educational environment" (VIEE) and examines the researchers' views on its meaning exposed in the scientific literature. The article determines the didactic potential of the virtual information educational environment for the geography students training based on the analysis of the authors' experience of blended learning by means of the Google Classroom. It also specifies the features (immersion, interactivity, and dynamism, sense of presence, continuity, and causality). The authors highlighted the advantages of virtual information educational environment implementation, such as: increase of the efficiency of the educational process by intensifying the process of cognition and interpersonal interactive communication; continuous access to multimedia content both in Google Classroom and beyond; saving student time due to the absence of necessity to work out the training material "manually"; availability of virtual pages of the virtual class; individualization of the educational process; formation of informational culture of the geography students ; and more productive learning of the educational material at the expense of IT educational facilities. Among the disadvantages the article mentions low level of computerization, insignificant quantity and low quality of software products, underestimation of the role of VIEE in the professional training of geography students, and the lack of economic stimuli, etc.
△ Less
Submitted 18 February, 2020;
originally announced February 2020.