Skip to main content

Showing 1–12 of 12 results for author: Kozlowski, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.16264  [pdf, other

    cs.IR cs.CL

    CLIRudit: Cross-Lingual Information Retrieval of Scientific Documents

    Authors: Francisco Valentini, Diego Kozlowski, Vincent Larivière

    Abstract: Cross-lingual information retrieval (CLIR) consists in finding relevant documents in a language that differs from the language of the queries. This paper presents CLIRudit, a new dataset created to evaluate cross-lingual academic search, focusing on English queries and French documents. The dataset is built using bilingual article metadata from Érudit, a Canadian publishing platform, and is design… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  2. arXiv:2502.13934  [pdf

    cs.DL

    Citation proximus: the role of social and semantic ties in citing behaviour

    Authors: Diego Kozlowski, Carolina Pradier, Pierre Benz, Natsumi Shokida, Jens Peter Andersen, Vincent Larivière

    Abstract: Citations are a key indicator of research impact but are shaped by factors beyond intrinsic research quality, including prestige, social networks, and thematic similarity. While the Matthew Effect explains how prestige accumulates and influences citation distributions, our study contextualizes this by showing that other mechanisms also play a crucial role. Analyzing a large dataset of disambiguate… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

  3. arXiv:2502.03627  [pdf, other

    cs.CL

    Sorting the Babble in Babel: Assessing the Performance of Language Detection Algorithms on the OpenAlex Database

    Authors: Maxime Holmberg Sainte-Marie, Diego Kozlowski, Lucía Céspedes, Vincent Larivière

    Abstract: This project aims to compare various language classification procedures, procedures combining various Python language detection algorithms and metadata-based corpora extracted from manually-annotated articles sampled from the OpenAlex database. Following an analysis of precision and recall performance for each algorithm, corpus, and language as well as of processing speeds recorded for each algori… ▽ More

    Submitted 18 February, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

    Comments: 33 pages, 4 figures

  4. arXiv:2411.18306  [pdf

    cs.DL cs.IR

    Delineating Feminist Studies through bibliometric analysis

    Authors: Natsumi S. Shokida, Diego Kozlowski, Vincent Larivière

    Abstract: The multidisciplinary and socially anchored nature of Feminist Studies presents unique challenges for bibliometric analysis, as this research area transcends traditional disciplinary boundaries and reflects discussions from feminist and LGBTQIA+ social movements. This paper proposes a novel approach for identifying gender/sex related publications scattered across diverse scientific disciplines. Us… ▽ More

    Submitted 27 November, 2024; originally announced November 2024.

    Comments: 2 tables, 5 figures

  5. arXiv:2409.10633  [pdf

    cs.DL cs.DB

    Evaluating the Linguistic Coverage of OpenAlex: An Assessment of Metadata Accuracy and Completeness

    Authors: Lucía Céspedes, Diego Kozlowski, Carolina Pradier, Maxime Holmberg Sainte-Marie, Natsumi Solange Shokida, Pierre Benz, Constance Poitras, Anton Boudreau Ninkov, Saeideh Ebrahimy, Philips Ayeni, Sarra Filali, Bing Li, Vincent Larivière

    Abstract: Clarivate's Web of Science (WoS) and Elsevier's Scopus have been for decades the main sources of bibliometric information. Although highly curated, these closed, proprietary databases are largely biased towards English-language publications, underestimating the use of other languages in research dissemination. Launched in 2022, OpenAlex promised comprehensive, inclusive, and open-source research i… ▽ More

    Submitted 19 September, 2024; v1 submitted 16 September, 2024; originally announced September 2024.

    Comments: Reference list updated and corrected, corresponding author's email contact added, minor typographical errors corrected

  6. arXiv:2408.07003  [pdf

    cs.CL cs.AI

    Generative AI for automatic topic labelling

    Authors: Diego Kozlowski, Carolina Pradier, Pierre Benz

    Abstract: Topic Modeling has become a prominent tool for the study of scientific fields, as they allow for a large scale interpretation of research trends. Nevertheless, the output of these models is structured as a list of keywords which requires a manual interpretation for the labelling. This paper proposes to assess the reliability of three LLMs, namely flan, GPT-4o, and GPT-4 mini for topic labelling. D… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: 10 pages, 1 figure

  7. arXiv:2407.18783  [pdf

    cs.DL

    Science for whom? The influence of the regional academic circuit on gender inequalities in Latin America

    Authors: Carolina Pradier, Diego Kozlowski, Natsumi S. Shokida, Vincent Larivière

    Abstract: The Latin-American scientific community has achieved significant progress towards gender parity, with nearly equal representation of women and men scientists. Nevertheless, women continue to be underrepresented in scholarly communication. Throughout the 20th century, Latin America established its academic circuit, focusing on research topics of regional significance. Through an analysis of scienti… ▽ More

    Submitted 28 November, 2024; v1 submitted 26 July, 2024; originally announced July 2024.

  8. arXiv:2402.04391  [pdf

    cs.DL cs.CY

    The Howard-Harvard effect: Institutional reproduction of intersectional inequalities

    Authors: Diego Kozlowski, Thema Monroe-White, Vincent Larivière, Cassidy R. Sugimoto

    Abstract: The US higher education system concentrates the production of science and scientists within a few institutions. This has implications for minoritized scholars and the topics with which they are disproportionately associated. This paper examines topical alignment between institutions and authors of varying intersectional identities, and the relationship with prestige and scientific impact. We obser… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  9. arXiv:2104.12553  [pdf

    cs.CY cs.IR physics.soc-ph

    Avoiding bias when inferring race using name-based approaches

    Authors: Diego Kozlowski, Dakota S. Murray, Alexis Bell, Will Hulsey, Vincent Larivière, Thema Monroe-White, Cassidy R. Sugimoto

    Abstract: Racial disparity in academia is a widely acknowledged problem. The quantitative understanding of racial based systemic inequalities is an important step towards a more equitable research system. However, because of the lack of robust information on authors' race, few large scale analyses have been performed on this topic. Algorithmic approaches offer one solution, using known information about aut… ▽ More

    Submitted 12 October, 2021; v1 submitted 14 April, 2021; originally announced April 2021.

    Journal ref: PLOS ONE 17(3) e0264270 (2022)

  10. Gender bias in magazines oriented to men and women: a computational approach

    Authors: Diego Kozlowski, Gabriela Lozano, Carla M. Felcher, Fernando Gonzalez, Edgar Altszyler

    Abstract: Cultural products are a source to acquire individual values and behaviours. Therefore, the differences in the content of the magazines aimed specifically at women or men are a means to create and reproduce gender stereotypes. In this study, we compare the content of a women-oriented magazine with that of a men-oriented one, both produced by the same editorial group, over a decade (2008-2018). With… ▽ More

    Submitted 24 November, 2020; originally announced November 2020.

    Journal ref: Feminist Media Studies (2022)

  11. arXiv:2011.02887  [pdf, other

    cs.SI cs.LG physics.soc-ph

    Semantic and Relational Spaces in Science of Science: Deep Learning Models for Article Vectorisation

    Authors: Diego Kozlowski, Jennifer Dusdal, Jun Pang, Andreas Zilian

    Abstract: Over the last century, we observe a steady and exponentially growth of scientific publications globally. The overwhelming amount of available literature makes a holistic analysis of the research within a field and between fields based on manual inspection impossible. Automatic techniques to support the process of literature review are required to find the epistemic and social patterns that are emb… ▽ More

    Submitted 5 November, 2020; originally announced November 2020.

    Journal ref: Scientometrics 126, 2021

  12. arXiv:2009.07727  [pdf, other

    physics.soc-ph cs.CY econ.GN

    Latent Dirichlet Allocation Models for World Trade Analysis

    Authors: Diego Kozlowski, Viktoriya Semeshenko, Andrea Molinari

    Abstract: The international trade is one of the classic areas of study in economics. Nowadays, given the availability of data, the tools used for the analysis can be complemented and enriched with new methodologies and techniques that go beyond the traditional approach. The present paper shows the application of the Latent Dirichlet Allocation Models, a well known technique from the area of Natural Language… ▽ More

    Submitted 16 September, 2020; originally announced September 2020.

    Journal ref: PLOS ONE (2021) 16(2): e0245393