-
Automated Knowledge Component Generation and Knowledge Tracing for Coding Problems
Authors:
Zhangqi Duan,
Nigel Fernandez,
Arun Balajiee Lekshmi Narayanan,
Mohammad Hassany,
Rafaella Sampaio de Alencar,
Peter Brusilovsky,
Bita Akram,
Andrew Lan
Abstract:
Knowledge components (KCs) mapped to problems help model student learning, tracking their mastery levels on fine-grained skills thereby facilitating personalized learning and feedback in online learning platforms. However, crafting and tagging KCs to problems, traditionally performed by human domain experts, is highly labor-intensive. We present a fully automated, LLM-based pipeline for KC generat…
▽ More
Knowledge components (KCs) mapped to problems help model student learning, tracking their mastery levels on fine-grained skills thereby facilitating personalized learning and feedback in online learning platforms. However, crafting and tagging KCs to problems, traditionally performed by human domain experts, is highly labor-intensive. We present a fully automated, LLM-based pipeline for KC generation and tagging for open-ended programming problems. We also develop an LLM-based knowledge tracing (KT) framework to leverage these LLM-generated KCs, which we refer to as KCGen-KT. We conduct extensive quantitative and qualitative evaluations on a real-world student code submission dataset. We find that KCGen-KT outperforms existing KT methods and human-written KCs on future student response prediction. We investigate the learning curves of generated KCs and show that LLM-generated KCs result in a better fit than human-written KCs under a cognitive model. We also conduct a human evaluation with course instructors to show that our pipeline generates reasonably accurate problem-KC mappings.
△ Less
Submitted 23 May, 2025; v1 submitted 25 February, 2025;
originally announced February 2025.
-
Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation for Classification
Authors:
Jan Cegin,
Branislav Pecher,
Jakub Simko,
Ivan Srba,
Maria Bielikova,
Peter Brusilovsky
Abstract:
The generative large language models (LLMs) are increasingly used for data augmentation tasks, where text samples are paraphrased (or generated anew) and then used for classifier fine-tuning. Existing works on augmentation leverage the few-shot scenarios, where samples are given to LLMs as part of prompts, leading to better augmentations. Yet, the samples are mostly selected randomly and a compreh…
▽ More
The generative large language models (LLMs) are increasingly used for data augmentation tasks, where text samples are paraphrased (or generated anew) and then used for classifier fine-tuning. Existing works on augmentation leverage the few-shot scenarios, where samples are given to LLMs as part of prompts, leading to better augmentations. Yet, the samples are mostly selected randomly and a comprehensive overview of the effects of other (more ``informed'') sample selection strategies is lacking. In this work, we compare sample selection strategies existing in few-shot learning literature and investigate their effects in LLM-based textual augmentation. We evaluate this on in-distribution and out-of-distribution classifier performance. Results indicate, that while some ``informed'' selection strategies increase the performance of models, especially for out-of-distribution data, it happens only seldom and with marginal performance increases. Unless further advances are made, a default of random sample selection remains a good option for augmentation practitioners.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
LLMs vs Established Text Augmentation Techniques for Classification: When do the Benefits Outweight the Costs?
Authors:
Jan Cegin,
Jakub Simko,
Peter Brusilovsky
Abstract:
The generative large language models (LLMs) are increasingly being used for data augmentation tasks, where text samples are LLM-paraphrased and then used for classifier fine-tuning. However, a research that would confirm a clear cost-benefit advantage of LLMs over more established augmentation methods is largely missing. To study if (and when) is the LLM-based augmentation advantageous, we compare…
▽ More
The generative large language models (LLMs) are increasingly being used for data augmentation tasks, where text samples are LLM-paraphrased and then used for classifier fine-tuning. However, a research that would confirm a clear cost-benefit advantage of LLMs over more established augmentation methods is largely missing. To study if (and when) is the LLM-based augmentation advantageous, we compared the effects of recent LLM augmentation methods with established ones on 6 datasets, 3 classifiers and 2 fine-tuning methods. We also varied the number of seeds and collected samples to better explore the downstream model accuracy space. Finally, we performed a cost-benefit analysis and show that LLM-based methods are worthy of deployment only when very small number of seeds is used. Moreover, in many cases, established methods lead to similar or better model accuracies.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
Explaining Code Examples in Introductory Programming Courses: LLM vs Humans
Authors:
Arun-Balajiee Lekshmi-Narayanan,
Priti Oli,
Jeevan Chapagain,
Mohammad Hassany,
Rabin Banjade,
Peter Brusilovsky,
Vasile Rus
Abstract:
Worked examples, which present an explained code for solving typical programming problems are among the most popular types of learning content in programming classes. Most approaches and tools for presenting these examples to students are based on line-by-line explanations of the example code. However, instructors rarely have time to provide explanations for many examples typically used in a progr…
▽ More
Worked examples, which present an explained code for solving typical programming problems are among the most popular types of learning content in programming classes. Most approaches and tools for presenting these examples to students are based on line-by-line explanations of the example code. However, instructors rarely have time to provide explanations for many examples typically used in a programming class. In this paper, we assess the feasibility of using LLMs to generate code explanations for passive and active example exploration systems. To achieve this goal, we compare the code explanations generated by chatGPT with the explanations generated by both experts and students.
△ Less
Submitted 11 March, 2024; v1 submitted 8 December, 2023;
originally announced March 2024.
-
Human-AI Co-Creation of Worked Examples for Programming Classes
Authors:
Mohammad Hassany,
Peter Brusilovsky,
Jiaze Ke,
Kamil Akhuseyinoglu,
Arun Balajiee Lekshmi Narayanan
Abstract:
Worked examples (solutions to typical programming problems presented as a source code in a certain language and are used to explain the topics from a programming class) are among the most popular types of learning content in programming classes. Most approaches and tools for presenting these examples to students are based on line-by-line explanations of the example code. However, instructors rarel…
▽ More
Worked examples (solutions to typical programming problems presented as a source code in a certain language and are used to explain the topics from a programming class) are among the most popular types of learning content in programming classes. Most approaches and tools for presenting these examples to students are based on line-by-line explanations of the example code. However, instructors rarely have time to provide line-by-line explanations for a large number of examples typically used in a programming class. In this paper, we explore and assess a human-AI collaboration approach to authoring worked examples for Java programming. We introduce an authoring system for creating Java worked examples that generates a starting version of code explanations and presents it to the instructor to edit if necessary.We also present a study that assesses the quality of explanations created with this approach
△ Less
Submitted 29 February, 2024; v1 submitted 25 February, 2024;
originally announced February 2024.
-
Effects of diversity incentives on sample diversity and downstream model performance in LLM-based text augmentation
Authors:
Jan Cegin,
Branislav Pecher,
Jakub Simko,
Ivan Srba,
Maria Bielikova,
Peter Brusilovsky
Abstract:
The latest generative large language models (LLMs) have found their application in data augmentation tasks, where small numbers of text samples are LLM-paraphrased and then used to fine-tune downstream models. However, more research is needed to assess how different prompts, seed data selection strategies, filtering methods, or model settings affect the quality of paraphrased data (and downstream…
▽ More
The latest generative large language models (LLMs) have found their application in data augmentation tasks, where small numbers of text samples are LLM-paraphrased and then used to fine-tune downstream models. However, more research is needed to assess how different prompts, seed data selection strategies, filtering methods, or model settings affect the quality of paraphrased data (and downstream models). In this study, we investigate three text diversity incentive methods well established in crowdsourcing: taboo words, hints by previous outlier solutions, and chaining on previous outlier solutions. Using these incentive methods as part of instructions to LLMs augmenting text datasets, we measure their effects on generated texts lexical diversity and downstream model performance. We compare the effects over 5 different LLMs, 6 datasets and 2 downstream models. We show that diversity is most increased by taboo words, but downstream model performance is highest with hints.
△ Less
Submitted 18 August, 2024; v1 submitted 12 January, 2024;
originally announced January 2024.
-
Authoring Worked Examples for Java Programming with Human-AI Collaboration
Authors:
Mohammad Hassany,
Peter Brusilovsky,
Jiaze Ke,
Kamil Akhuseyinoglu,
Arun Balajiee Lekshmi Narayanan
Abstract:
Worked examples (solutions to typical programming problems presented as a source code in a certain language and are used to explain the topics from a programming class) are among the most popular types of learning content in programming classes. Most approaches and tools for presenting these examples to students are based on line-by-line explanations of the example code. However, instructors rarel…
▽ More
Worked examples (solutions to typical programming problems presented as a source code in a certain language and are used to explain the topics from a programming class) are among the most popular types of learning content in programming classes. Most approaches and tools for presenting these examples to students are based on line-by-line explanations of the example code. However, instructors rarely have time to provide line-by-line explanations for a large number of examples typically used in a programming class. In this paper, we explore and assess a human-AI collaboration approach to authoring worked examples for Java programming. We introduce an authoring system for creating Java worked examples that generates a starting version of code explanations and presents it to the instructor to edit if necessary. We also present a study that assesses the quality of explanations created with this approach.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
ChatGPT to Replace Crowdsourcing of Paraphrases for Intent Classification: Higher Diversity and Comparable Model Robustness
Authors:
Jan Cegin,
Jakub Simko,
Peter Brusilovsky
Abstract:
The emergence of generative large language models (LLMs) raises the question: what will be its impact on crowdsourcing? Traditionally, crowdsourcing has been used for acquiring solutions to a wide variety of human-intelligence tasks, including ones involving text generation, modification or evaluation. For some of these tasks, models like ChatGPT can potentially substitute human workers. In this s…
▽ More
The emergence of generative large language models (LLMs) raises the question: what will be its impact on crowdsourcing? Traditionally, crowdsourcing has been used for acquiring solutions to a wide variety of human-intelligence tasks, including ones involving text generation, modification or evaluation. For some of these tasks, models like ChatGPT can potentially substitute human workers. In this study, we investigate whether this is the case for the task of paraphrase generation for intent classification. We apply data collection methodology of an existing crowdsourcing study (similar scale, prompts and seed data) using ChatGPT and Falcon-40B. We show that ChatGPT-created paraphrases are more diverse and lead to at least as robust models.
△ Less
Submitted 19 October, 2023; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Knowledge Tracing for Complex Problem Solving: Granular Rank-Based Tensor Factorization
Authors:
Chunpai Wang,
Shaghayegh Sahebi,
Siqian Zhao,
Peter Brusilovsky,
Laura O. Moraes
Abstract:
Knowledge Tracing (KT), which aims to model student knowledge level and predict their performance, is one of the most important applications of user modeling. Modern KT approaches model and maintain an up-to-date state of student knowledge over a set of course concepts according to students' historical performance in attempting the problems. However, KT approaches were designed to model knowledge…
▽ More
Knowledge Tracing (KT), which aims to model student knowledge level and predict their performance, is one of the most important applications of user modeling. Modern KT approaches model and maintain an up-to-date state of student knowledge over a set of course concepts according to students' historical performance in attempting the problems. However, KT approaches were designed to model knowledge by observing relatively small problem-solving steps in Intelligent Tutoring Systems. While these approaches were applied successfully to model student knowledge by observing student solutions for simple problems, they do not perform well for modeling complex problem solving in students.M ost importantly, current models assume that all problem attempts are equally valuable in quantifying current student knowledge.However, for complex problems that involve many concepts at the same time, this assumption is deficient. In this paper, we argue that not all attempts are equivalently important in discovering students' knowledge state, and some attempts can be summarized together to better represent student performance. We propose a novel student knowledge tracing approach, Granular RAnk based TEnsor factorization (GRATE), that dynamically selects student attempts that can be aggregated while predicting students' performance in problems and discovering the concepts presented in them. Our experiments on three real-world datasets demonstrate the improved performance of GRATE, compared to the state-of-the-art baselines, in the task of student performance prediction. Our further analysis shows that attempt aggregation eliminates the unnecessary fluctuations from students' discovered knowledge states and helps in discovering complex latent concepts in the problems.
△ Less
Submitted 6 October, 2022;
originally announced October 2022.
-
From Ranked Lists to Carousels: A Carousel Click Model
Authors:
Behnam Rahdari,
Branislav Kveton,
Peter Brusilovsky
Abstract:
Carousel-based recommendation interfaces allow users to explore recommended items in a structured, efficient, and visually-appealing way. This made them a de-facto standard approach to recommending items to end users in many real-life recommenders. In this work, we try to explain the efficiency of carousel recommenders using a \emph{carousel click model}, a generative model of user interaction wit…
▽ More
Carousel-based recommendation interfaces allow users to explore recommended items in a structured, efficient, and visually-appealing way. This made them a de-facto standard approach to recommending items to end users in many real-life recommenders. In this work, we try to explain the efficiency of carousel recommenders using a \emph{carousel click model}, a generative model of user interaction with carousel-based recommender interfaces. We study this model both analytically and empirically. Our analytical results show that the user can examine more items in the carousel click model than in a single ranked list, due to the structured way of browsing. These results are supported by a series of experiments, where we integrate the carousel click model with a recommender based on matrix factorization. We show that the combined recommender performs well on held-out test data, and leads to higher engagement with recommendations than a traditional single ranked list.
△ Less
Submitted 27 September, 2022;
originally announced September 2022.
-
Concept Annotation for Intelligent Textbooks
Authors:
Mengdi Wang,
Hung Chau,
Khushboo Thaker,
Peter Brusilovsky,
Daqing He
Abstract:
With the increased popularity of electronic textbooks, there is a growing interests in developing a new generation of "intelligent textbooks", which have the ability to guide the readers according to their learning goals and current knowledge. The intelligent textbooks extend regular textbooks by integrating machine-manipulatable knowledge such as a knowledge map or a prerequisite-outcome relation…
▽ More
With the increased popularity of electronic textbooks, there is a growing interests in developing a new generation of "intelligent textbooks", which have the ability to guide the readers according to their learning goals and current knowledge. The intelligent textbooks extend regular textbooks by integrating machine-manipulatable knowledge such as a knowledge map or a prerequisite-outcome relationship between sections, among which, the most popular integrated knowledge is a list of unique knowledge concepts associated with each section. With the help of this concept, multiple intelligent operations, such as content linking, content recommendation or student modeling, can be performed. However, annotating a reliable set of concepts to a textbook section is a challenge. Automatic unsupervised methods for extracting key-phrases as the concepts are known to have insufficient accuracy. Manual annotation by experts is considered as a preferred approach and can be used to produce both the target outcome and the labeled data for training supervised models. However, most researchers in education domain still consider the concept annotation process as an ad-hoc activity rather than an engineering task, resulting in low-quality annotated data. In this paper, we present a textbook knowledge engineering method to obtain reliable concept annotations. The outcomes of our work include a validated knowledge engineering procedure, a code-book for technical concept annotation, and a set of concept annotations for the target textbook, which could be used as gold standard in further research.
△ Less
Submitted 9 June, 2020; v1 submitted 22 May, 2020;
originally announced May 2020.
-
Does Order Matter? An Empirical Study on Generating Multiple Keyphrases as a Sequence
Authors:
Rui Meng,
Xingdi Yuan,
Tong Wang,
Peter Brusilovsky,
Adam Trischler,
Daqing He
Abstract:
Recently, concatenating multiple keyphrases as a target sequence has been proposed as a new learning paradigm for keyphrase generation. Existing studies concatenate target keyphrases in different orders but no study has examined the effects of ordering on models' behavior. In this paper, we propose several orderings for concatenation and inspect the important factors for training a successful keyp…
▽ More
Recently, concatenating multiple keyphrases as a target sequence has been proposed as a new learning paradigm for keyphrase generation. Existing studies concatenate target keyphrases in different orders but no study has examined the effects of ordering on models' behavior. In this paper, we propose several orderings for concatenation and inspect the important factors for training a successful keyphrase generation model. By running comprehensive comparisons, we observe one preferable ordering and summarize a number of empirical findings and challenges, which can shed light on future research on this line of work.
△ Less
Submitted 28 February, 2022; v1 submitted 8 September, 2019;
originally announced September 2019.
-
Sequence Analysis of Learning Behavior in Different Consecutive Activities
Authors:
Abdulelah Abuabat,
Peter Brusilovsky
Abstract:
The purpose of this research is to study the possibility of identifying students, statistically, by analyzing their behavior in different consecutive activities. In this project, there are three different sorts of activities: animated example, basic example, and parameterized exercises. We extracted the behavior of each student from the log activities of the Mastery Grids platform. Additionally, w…
▽ More
The purpose of this research is to study the possibility of identifying students, statistically, by analyzing their behavior in different consecutive activities. In this project, there are three different sorts of activities: animated example, basic example, and parameterized exercises. We extracted the behavior of each student from the log activities of the Mastery Grids platform. Additionally, we investigate by using unsupervised learning technique, whether there are common patterns, that students share or not while performing these activities. We conclude that we are able to identify students from their behavior, besides that there are some common patterns.
△ Less
Submitted 4 April, 2019;
originally announced April 2019.
-
One Size Does Not Fit All: Generating and Evaluating Variable Number of Keyphrases
Authors:
Xingdi Yuan,
Tong Wang,
Rui Meng,
Khushboo Thaker,
Peter Brusilovsky,
Daqing He,
Adam Trischler
Abstract:
Different texts shall by nature correspond to different number of keyphrases. This desideratum is largely missing from existing neural keyphrase generation models. In this study, we address this problem from both modeling and evaluation perspectives.
We first propose a recurrent generative model that generates multiple keyphrases as delimiter-separated sequences. Generation diversity is further…
▽ More
Different texts shall by nature correspond to different number of keyphrases. This desideratum is largely missing from existing neural keyphrase generation models. In this study, we address this problem from both modeling and evaluation perspectives.
We first propose a recurrent generative model that generates multiple keyphrases as delimiter-separated sequences. Generation diversity is further enhanced with two novel techniques by manipulating decoder hidden states. In contrast to previous approaches, our model is capable of generating diverse keyphrases and controlling number of outputs.
We further propose two evaluation metrics tailored towards the variable-number generation. We also introduce a new dataset StackEx that expands beyond the only existing genre (i.e., academic writing) in keyphrase generation tasks. With both previous and new evaluation metrics, our model outperforms strong baselines on all datasets.
△ Less
Submitted 12 May, 2020; v1 submitted 11 October, 2018;
originally announced October 2018.
-
Deep Keyphrase Generation
Authors:
Rui Meng,
Sanqiang Zhao,
Shuguang Han,
Daqing He,
Peter Brusilovsky,
Yu Chi
Abstract:
Keyphrase provides highly-condensed information that can be effectively used for understanding, organizing and retrieving text content. Though previous studies have provided many workable solutions for automated keyphrase extraction, they commonly divided the to-be-summarized content into multiple text chunks, then ranked and selected the most meaningful ones. These approaches could neither identi…
▽ More
Keyphrase provides highly-condensed information that can be effectively used for understanding, organizing and retrieving text content. Though previous studies have provided many workable solutions for automated keyphrase extraction, they commonly divided the to-be-summarized content into multiple text chunks, then ranked and selected the most meaningful ones. These approaches could neither identify keyphrases that do not appear in the text, nor capture the real semantic meaning behind the text. We propose a generative model for keyphrase prediction with an encoder-decoder framework, which can effectively overcome the above drawbacks. We name it as deep keyphrase generation since it attempts to capture the deep semantic meaning of the content with a deep learning method. Empirical analysis on six datasets demonstrates that our proposed model not only achieves a significant performance boost on extracting keyphrases that appear in the source text, but also can generate absent keyphrases based on the semantic meaning of the text. Code and dataset are available at https://github.com/memray/OpenNMT-kpg-release.
△ Less
Submitted 31 May, 2021; v1 submitted 23 April, 2017;
originally announced April 2017.