-
The State of Papers, Retractions, and Preprints: Evidence from the CrossRef Database (2004-2024)
Authors:
Khalid M. Saqr
Abstract:
A 20-year analysis of CrossRef metadata demonstrates that global scholarly output -- encompassing publications, retractions, and preprints -- exhibits strikingly inertial growth, well-described by exponential, quadratic, and logistic models with nearly indistinguishable goodness-of-fit. Retraction dynamics, in particular, remain stable and minimally affected by the COVID-19 shock, which contribute…
▽ More
A 20-year analysis of CrossRef metadata demonstrates that global scholarly output -- encompassing publications, retractions, and preprints -- exhibits strikingly inertial growth, well-described by exponential, quadratic, and logistic models with nearly indistinguishable goodness-of-fit. Retraction dynamics, in particular, remain stable and minimally affected by the COVID-19 shock, which contributed less than 1% to total notices. Since 2004, publications doubled every 9.8 years, retractions every 11.4 years, and preprints at the fastest rate, every 5.6 years. The findings underscore a system primed for ongoing stress at unchanged structural bottlenecks. Although model forecasts diverge beyond 2024, the evidence suggests that the future trajectory of scholarly communication will be determined by persistent systemic inertia rather than episodic disruptions -- unless intentionally redirected by policy or AI-driven reform.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Delving Into the Psychology of Machines: Exploring the Structure of Self-Regulated Learning via LLM-Generated Survey Responses
Authors:
Leonie V. D. E. Vogelsmeier,
Eduardo Oliveira,
Kamila Misiejuk,
Sonsoles López-Pernas,
Mohammed Saqr
Abstract:
Large language models (LLMs) offer the potential to simulate human-like responses and behaviors, creating new opportunities for psychological science. In the context of self-regulated learning (SRL), if LLMs can reliably simulate survey responses at scale and speed, they could be used to test intervention scenarios, refine theoretical models, augment sparse datasets, and represent hard-to-reach po…
▽ More
Large language models (LLMs) offer the potential to simulate human-like responses and behaviors, creating new opportunities for psychological science. In the context of self-regulated learning (SRL), if LLMs can reliably simulate survey responses at scale and speed, they could be used to test intervention scenarios, refine theoretical models, augment sparse datasets, and represent hard-to-reach populations. However, the validity of LLM-generated survey responses remains uncertain, with limited research focused on SRL and existing studies beyond SRL yielding mixed results. Therefore, in this study, we examined LLM-generated responses to the 44-item Motivated Strategies for Learning Questionnaire (MSLQ; Pintrich \& De Groot, 1990), a widely used instrument assessing students' learning strategies and academic motivation. Particularly, we used the LLMs GPT-4o, Claude 3.7 Sonnet, Gemini 2 Flash, LLaMA 3.1-8B, and Mistral Large. We analyzed item distributions, the psychological network of the theoretical SRL dimensions, and psychometric validity based on the latent factor structure. Our results suggest that Gemini 2 Flash was the most promising LLM, showing considerable sampling variability and producing underlying dimensions and theoretical relationships that align with prior theory and empirical findings. At the same time, we observed discrepancies and limitations, underscoring both the potential and current constraints of using LLMs for simulating psychological survey data and applying it in educational contexts.
△ Less
Submitted 16 June, 2025;
originally announced June 2025.
-
Human-AI Collaboration or Academic Misconduct? Measuring AI Use in Student Writing Through Stylometric Evidence
Authors:
Eduardo Araujo Oliveira,
Madhavi Mohoni,
Sonsoles López-Pernas,
Mohammed Saqr
Abstract:
As human-AI collaboration becomes increasingly prevalent in educational contexts, understanding and measuring the extent and nature of such interactions pose significant challenges. This research investigates the use of authorship verification (AV) techniques not as a punitive measure, but as a means to quantify AI assistance in academic writing, with a focus on promoting transparency, interpretab…
▽ More
As human-AI collaboration becomes increasingly prevalent in educational contexts, understanding and measuring the extent and nature of such interactions pose significant challenges. This research investigates the use of authorship verification (AV) techniques not as a punitive measure, but as a means to quantify AI assistance in academic writing, with a focus on promoting transparency, interpretability, and student development. Building on prior work, we structured our investigation into three stages: dataset selection and expansion, AV method development, and systematic evaluation. Using three datasets - including a public dataset (PAN-14) and two from University of Melbourne students from various courses - we expanded the data to include LLM-generated texts, totalling 1,889 documents and 540 authorship problems from 506 students. We developed an adapted Feature Vector Difference AV methodology to construct robust academic writing profiles for students, designed to capture meaningful, individual characteristics of their writing. The method's effectiveness was evaluated across multiple scenarios, including distinguishing between student-authored and LLM-generated texts and testing resilience against LLMs' attempts to mimic student writing styles. Results demonstrate the enhanced AV classifier's ability to identify stylometric discrepancies and measure human-AI collaboration at word and sentence levels while providing educators with a transparent tool to support academic integrity investigations. This work advances AV technology, offering actionable insights into the dynamics of academic writing in an AI-driven era.
△ Less
Submitted 12 May, 2025;
originally announced May 2025.
-
Simulation of Non-Ordinary Consciousness
Authors:
Khalid M. Saqr
Abstract:
The symbolic architecture of non-ordinary consciousness remains largely unmapped in cognitive science and artificial intelligence. While conventional models prioritize rational coherence, altered states such as those induced by psychedelics reveal distinct symbolic regimes characterized by recursive metaphor, ego dissolution, and semantic destabilization. We present \textit{Glyph}, a generative sy…
▽ More
The symbolic architecture of non-ordinary consciousness remains largely unmapped in cognitive science and artificial intelligence. While conventional models prioritize rational coherence, altered states such as those induced by psychedelics reveal distinct symbolic regimes characterized by recursive metaphor, ego dissolution, and semantic destabilization. We present \textit{Glyph}, a generative symbolic interface designed to simulate psilocybin-like symbolic cognition in large language models. Rather than modeling perception or mood, Glyph enacts symbolic transformation through recursive reentry, metaphoric modulation, and entropy-scaled destabilization -- a triadic operator formalized within a tensorial linguistic framework. Experimental comparison with baseline GPT-4o reveals that Glyph consistently generates high-entropy, metaphor-saturated, and ego-dissolving language across diverse symbolic prompt categories. These results validate the emergence of non-ordinary cognitive patterns and support a new paradigm for simulating altered consciousness through language. Glyph opens novel pathways for modeling symbolic cognition, exploring metaphor theory, and encoding knowledge in recursively altered semantic spaces.
△ Less
Submitted 29 March, 2025;
originally announced March 2025.
-
Complex Dynamic Systems in Education: Beyond the Static, the Linear and the Causal Reductionism
Authors:
Mohammed Saqr,
Daryn Dever,
Sonsoles López-Pernas,
Christophe Gernigon,
Gwen Marchand,
Avi Kaplan
Abstract:
Traditional methods in educational research often fail to capture the complex and evolving nature of learning processes. This chapter examines the use of complex systems theory in education to address these limitations. The chapter covers the main characteristics of complex systems such as non-linear relationships, emergent properties, and feedback mechanisms to explain how educational phenomena u…
▽ More
Traditional methods in educational research often fail to capture the complex and evolving nature of learning processes. This chapter examines the use of complex systems theory in education to address these limitations. The chapter covers the main characteristics of complex systems such as non-linear relationships, emergent properties, and feedback mechanisms to explain how educational phenomena unfold. Some of the main methodological approaches are presented, such as network analysis and recurrence quantification analysis to study relationships and patterns in learning. These have been operationalized by existing education research to study self-regulation, engagement, and academic emotions, among other learning-related constructs. Lastly, the chapter describes data collection methods that are suitable for studying learning processes from a complex systems' lens.
△ Less
Submitted 30 January, 2025; v1 submitted 18 December, 2024;
originally announced January 2025.
-
An XAI Social Media Platform for Teaching K-12 Students AI-Driven Profiling, Clustering, and Engagement-Based Recommending
Authors:
Nicolas Pope,
Juho Kahila,
Henriikka Vartiainen,
Mohammed Saqr,
Sonsoles Lopez-Pernas,
Teemu Roos,
Jari Laru,
Matti Tedre
Abstract:
This paper, submitted to the special track on resources for teaching AI in K-12, presents an explainable AI (XAI) education tool designed for K-12 classrooms, particularly for students in grades 4-9. The tool was designed for interventions on the fundamental processes behind social media platforms, focusing on four AI- and data-driven core concepts: data collection, user profiling, engagement metr…
▽ More
This paper, submitted to the special track on resources for teaching AI in K-12, presents an explainable AI (XAI) education tool designed for K-12 classrooms, particularly for students in grades 4-9. The tool was designed for interventions on the fundamental processes behind social media platforms, focusing on four AI- and data-driven core concepts: data collection, user profiling, engagement metrics, and recommendation algorithms. An Instagram-like interface and a monitoring tool for explaining the data-driven processes make these complex ideas accessible and engaging for young learners. The tool provides hands-on experiments and real-time visualizations, illustrating how user actions influence both their personal experience on the platform and the experience of others. This approach seeks to enhance learners' data agency, AI literacy, and sensitivity to AI ethics. The paper includes a case example from 12 two-hour test sessions involving 209 children, using learning analytics to demonstrate how they navigated their social media feeds and the browsing patterns that emerged.
△ Less
Submitted 18 December, 2024;
originally announced December 2024.
-
Transition Network Analysis: A Novel Framework for Modeling, Visualizing, and Identifying the Temporal Patterns of Learners and Learning Processes
Authors:
Mohammed Saqr,
Sonsoles López-Pernas,
Tiina Törmänen,
Rogers Kaliisa,
Kamila Misiejuk,
Santtu Tikka
Abstract:
This paper presents a novel learning analytics method: Transition Network Analysis (TNA), a method that integrates Stochastic Process Mining and probabilistic graph representation to model, visualize, and identify transition patterns in the learning process data. Combining the relational and temporal aspects into a single lens offers capabilities beyond either framework, including centralities to…
▽ More
This paper presents a novel learning analytics method: Transition Network Analysis (TNA), a method that integrates Stochastic Process Mining and probabilistic graph representation to model, visualize, and identify transition patterns in the learning process data. Combining the relational and temporal aspects into a single lens offers capabilities beyond either framework, including centralities to capture important learning events, community detection to identify behavior patterns, and clustering to reveal temporal patterns. Furthermore, TNA introduces several significance tests that go beyond either method and add rigor to the analysis. Here, we introduce the theoretical and mathematical foundations of TNA and we demonstrate the functionalities of TNA with a case study where students (n=191) engaged in small-group collaboration to map patterns of group dynamics using the theories of co-regulation and socially-shared regulated learning. The analysis revealed that TNA can map the regulatory processes as well as identify important events, patterns, and clusters. Bootstrap validation established the significant transitions and eliminated spurious transitions. As such, TNA can capture learning dynamics and provide a robust framework for investigating the temporal evolution of learning processes. Future directions include -- inter alia -- expanding estimation methods, reliability assessment, and building longitudinal TNA.
△ Less
Submitted 5 February, 2025; v1 submitted 23 November, 2024;
originally announced November 2024.
-
Have Learning Analytics Dashboards Lived Up to the Hype? A Systematic Review of Impact on Students' Achievement, Motivation, Participation and Attitude
Authors:
Rogers Kaliisa,
Kamila Misiejuk,
Sonsoles López-Pernas,
Mohammad Khalil,
Mohammed Saqr
Abstract:
While learning analytics dashboards (LADs) are the most common form of LA intervention, there is limited evidence regarding their impact on students learning outcomes. This systematic review synthesizes the findings of 38 research studies to investigate the impact of LADs on students' learning outcomes, encompassing achievement, participation, motivation, and attitudes. As we currently stand, ther…
▽ More
While learning analytics dashboards (LADs) are the most common form of LA intervention, there is limited evidence regarding their impact on students learning outcomes. This systematic review synthesizes the findings of 38 research studies to investigate the impact of LADs on students' learning outcomes, encompassing achievement, participation, motivation, and attitudes. As we currently stand, there is no evidence to support the conclusion that LADs have lived up to the promise of improving academic achievement. Most studies reported negligible or small effects, with limited evidence from well-powered controlled experiments. Many studies merely compared users and non-users of LADs, confounding the dashboard effect with student engagement levels. Similarly, the impact of LADs on motivation and attitudes appeared modest, with only a few exceptions demonstrating significant effects. Small sample sizes in these studies highlight the need for larger-scale investigations to validate these findings. Notably, LADs showed a relatively substantial impact on student participation. Several studies reported medium to large effect sizes, suggesting that LADs can promote engagement and interaction in online learning environments. However, methodological shortcomings, such as reliance on traditional evaluation methods, self-selection bias, the assumption that access equates to usage, and a lack of standardized assessment tools, emerged as recurring issues. To advance the research line for LADs, researchers should use rigorous assessment methods and establish clear standards for evaluating learning constructs. Such efforts will advance our understanding of the potential of LADs to enhance learning outcomes and provide valuable insights for educators and researchers alike.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
A modern approach to transition analysis and process mining with Markov models: A tutorial with R
Authors:
Jouni Helske,
Satu Helske,
Mohammed Saqr,
Sonsoles López-Pernas,
Keefe Murphy
Abstract:
This chapter presents an introduction to Markovian modeling for the analysis of sequence data. Contrary to the deterministic approach seen in the previous sequence analysis chapters, Markovian models are probabilistic models, focusing on the transitions between states instead of studying sequences as a whole. The chapter provides an introduction to this method and differentiates between its most c…
▽ More
This chapter presents an introduction to Markovian modeling for the analysis of sequence data. Contrary to the deterministic approach seen in the previous sequence analysis chapters, Markovian models are probabilistic models, focusing on the transitions between states instead of studying sequences as a whole. The chapter provides an introduction to this method and differentiates between its most common variations: first-order Markov models, hidden Markov models, mixture Markov models, and mixture hidden Markov models. In addition to a thorough explanation and contextualization within the existing literature, the chapter provides a step-by-step tutorial on how to implement each type of Markovian model using the R package seqHMM. The chaper also provides a complete guide to performing stochastic process mining with Markovian models as well as plotting, comparing and clustering different process models.
△ Less
Submitted 2 September, 2023;
originally announced September 2023.
-
Temporal network analysis: Introduction, methods and detailed tutorial with R
Authors:
Mohammed Saqr
Abstract:
Learning involves relations, interactions and connections between learners, teachers and the world at large. Such interactions are essentially temporal and unfold in time. Yet, researchers have rarely combined the two aspects (the temporal and relational aspects) in an analytics framework. Temporal networks allow modeling of the temporal learning processes i.e., the emergence and flow of activitie…
▽ More
Learning involves relations, interactions and connections between learners, teachers and the world at large. Such interactions are essentially temporal and unfold in time. Yet, researchers have rarely combined the two aspects (the temporal and relational aspects) in an analytics framework. Temporal networks allow modeling of the temporal learning processes i.e., the emergence and flow of activities, communities, and social processes through fine-grained dynamic analysis. This can provide insights into phenomena like knowledge co-construction, information flow, and relationship building. This chapter introduces the basic concepts of temporal networks, their types and techniques. A detailed guide of temporal network analysis is introduced in this chapter, that starts with building the network, visualization, mathematical analysis on the node and graph level. The analysis is performed with a real-world dataset. The discussion chapter offers some extra resources for interested users who want to expand their knowledge of the technique.
△ Less
Submitted 23 July, 2023;
originally announced July 2023.
-
A Novel Kuhnian Ontology for Epistemic Classification of STM Scholarly Articles
Authors:
Khalid M. Saqr,
Abdelrahman Elsharawy
Abstract:
Thomas Kuhn proposed his paradigmatic view of scientific discovery five decades ago. The concept of paradigm has not only explained the progress of science, but has also become the central epistemic concept among STM scientists. Here, we adopt the principles of Kuhnian philosophy to construct a novel ontology aims at classifying and evaluating the impact of STM scholarly articles. First, we explai…
▽ More
Thomas Kuhn proposed his paradigmatic view of scientific discovery five decades ago. The concept of paradigm has not only explained the progress of science, but has also become the central epistemic concept among STM scientists. Here, we adopt the principles of Kuhnian philosophy to construct a novel ontology aims at classifying and evaluating the impact of STM scholarly articles. First, we explain how the Kuhnian cycle of science describes research at different epistemic stages. Second, we show how the Kuhnian cycle could be reconstructed into modular ontologies which classify scholarly articles according to their contribution to paradigm-centred knowledge. The proposed ontology and its scenarios are discussed. To the best of the authors knowledge, this is the first attempt for creating an ontology for describing scholarly articles based on the Kuhnian paradigmatic view of science.
△ Less
Submitted 9 February, 2020;
originally announced February 2020.